Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Confidence values"

ferandiferandi Member Posts: 9 Contributor II
edited June 2019 in Help
Hi friends,

I'm using rapidminer to make text classification with svm(libsvm), k-nn and naive bayes algorithms. So, when i get the results of my test data, i'm not sure about how each one calculates the confidence values of each instance on each class. Can anyone help me? I need this information to my article.

Thanks in advance.
Tagged:

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    this is different for each of those algorithms:

    naive bayes: the confidence is directly the calculated probability delivered by the algorithm (actually, this is one of the rare cases where the confidence IS a real probability)
    k-nn: the confidence is the number of the k neighbors with the predicted class divided by k (the single values are weighted by distace in case of weighted predictions)
    svm (I am not so sure about the LibSVM which brings another calculation in the multiclass case): for binomial classes, a good estimation of the probability for the positive class which is also used by RapidMiner is 1 / (1 + exp(-function_value))) where function_value is the SVM prediction

    Hope that helps,
    Ingo
  • ferandiferandi Member Posts: 9 Contributor II
    Thank you very much Ingo!!!
  • ferandiferandi Member Posts: 9 Contributor II
    Just one thing....what´s the concept of confidence on text classification?


    Thanks
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    what´s the concept of confidence on text classification?
    well, pretty much the same as for all other kinds of classification tasks. The confidence describes how certain a prediction is. Although similar to a probability of a prediction of a specific class, it is most often not the same (with exception of some learners like Naive Bayes).

    The same applies for text classification, the confidence of a class value states how certain the model is that a document belongs to this class.

    Cheers,
    Ingo
  • ferandiferandi Member Posts: 9 Contributor II
    Hi, Thank you very mucho for your help!
    I need to clarify some aspects of my project:

    I'm using three different methods to classify approximately 3000 documents in 11 categories. The methods are: k-NN, Naive Bayes and SVM (libsvm linear Kernel C-SVC). After submitting the documents for each of the testing methods generates an output value with a confidence (0-1) of the document for each category and the category chosen is having the biggest confidence.
    What i´m doing is to sum the confidences of the document on each category on each 3 models and choose the label with the highest value, i guess this is called bagging, right?. Well, the fact is: my accuracy was improved about 2%. I´m yet not sure about how this confidence values are generated and normalized by Rapidminer on each model to support my conclusions. Do I have to normalize the values of each method to work together or i can consider them normalized and my result makes sense?

    Many thanks in advance!
  • jing_majing_ma Member Posts: 2 Contributor I

    Ingo, is there any documentation available for helping understand each algorithm's definition of confidence? Thanks!

    Jing

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist

    Dear Jing,

     

    first of all: welcome to the community. There is no documentation on how our 250+ learners are calculating confidence. Most of the things are either readable in text books or in our code. Is there any operator in specific where we can help you?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • BenLieBenLie Member Posts: 1 Learner III

    Here just look at the sampel

     Copy from Help: 


    Note that in the testing set, the attributes of the first example are Outlook = sunny and Wind = false. Naive Bayes does calculation for all possible label values and selects the label value that has maximum calculated probability.

    Calculation for label = yes

    Find product of following:

    Posterior probability of label = yes (i.e. 9/14)
    value from distribution table when Outlook = sunny and label = yes (i.e. 0.223)
    value from distribution table when Wind = false and label = yes (i.e. 0.659)
    Thus the answer = 9/14*0.223*0.659 = 0.094

    Calculation for label = no

    Find product of following:

    posterior probability of label = no (i.e. 5/14)
    value from distribution table when Outlook = sunny and label = no (i.e. 0.581)
    value from distribution table when Wind = false and label = no (i.e. 0.397)
    Thus the answer = 5/14*0.581*0.397= 0.082

    As the value for label = yes is the maximum of all possible label values, label is predicted to be yes.


    And this ist how the confidence is calculated:

     

    conf(yes) = 0.094/(0.094+0.082) = 0.534

    conf(no) = 0.082/(0.094+0.082) = 0,465

     

    Without round-off error you get:

    Bayes.PNG

     

Sign In or Register to comment.