Explicitly listing probabilities in a classification task

JaneJane Member Posts: 3 Contributor I
edited August 2019 in Help

I am using RapidMiner with a database of medical information to estimate the probability that a patient will be diagnosed with a certain class of ailment (eg. gastrointestinal, cancer, respiratory) based on their sociodemographic data.  My dataset contains almost one million records, with each record representing a patient.  For each patient and each ailment category, I have the label "true" if the patient has been diagnosed with an ailment in this category, and "false" if they have not been.

What I would like RapidMiner to do, is to learn the classification rules from a training set, and then return the probability that a record belongs to the classification group "true" for each record in the test set.  I have found many useful tools for performing the classification, but I can't find a routine that will tell me the value of P(true) after everything else is said and done.  If anyone has any suggestions about how to do this, I would be very grateful.  Thanks in advance!

-- Jane


  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Jane,

    after having learned the classification model on the training set, you can simply apply it on the data you wish to classify. When you apply the model, two columns are added to the example set which contain (not the probabilities but) the confidences that the examples are of the one or the other class.

  • Options
    JaneJane Member Posts: 3 Contributor I
    Hi Tobias,

    Thanks so much for your help!  After viewing your response I was able to find the appropriate columns in my data, it was just what I needed.

    -- Jane
Sign In or Register to comment.