Should I see a decision tree in the ROC chart with default threshold of 0.5?

hscheelhscheel Member Posts: 1 Learner I
edited September 2022 in Help
I'm running the "Hotel App Performance Measurement Solution" process from TrainingResources.  I thought I understood the ROC concept but now I am confused about that specific example. From the confusion matrix (pasted at the bottom below) I think the FPR is 6% and the TPR is 33%, which I find on the ROC (black lines crossing at (6%; 33%).  But then I thought that RapidMiners binary classification threshold is 0.5, and the ROC point corresponding to a threshold of 0.5 is more around (10%; 43%), as indicated by the green lines below.

Does this mean 0.5 is not the classification threshold used in this example? Or am I missing something else?

Any hint is greatly appreciated!

Thank you!
Holger.

 

Answers

  • TripartioTripartio Member Posts: 37 Maven
    edited November 2022
    @hscheel, This is a great question. I had a similar question and after a lot of testing on different datasets and different processes, I think I've figured out how this works.

    I believe that your logic is sound, but you are probably looking at the wrong ROC chart. There are three possibly relevant ROC charts with three distinct measures of AUC from the Performance (Binominal Classification) and the Performance operators. They differ on how they handle cases where two or more examples (cases, rows, observations) have the same probability estimate yet have different true values:
    • AUC (optimistic) handles tied probability estimates by sorting the correct estimates (true positives) first, thus boosting to a higher (optimistic) AUC score.
    • AUC (pessimistic) handles tied probability estimates by sorting the wrong estimates (false positives) first, thus diminishing to a lower (pessimistic) AUC score.
    • AUC takes the average score of AUC (optimistic) and AUC (pessimistic) to be the single AUC score, trying to represent random sorting of the ties.
    However, as far as I can tell, the classification matrix in RapidMiner is based only on the AUC (optimistic) ROC, not on the average AUC ROC. So, try selecting the AUC (optimistic) ROC as an option in the Performance (Binominal Classification) operator.

    However, there is yet another important complication: when you use cross validation as you did, the ROC thresholds might not match the classification matrix exactly because the results shown do not represent a single classification but rather the average of the k-folds of your cross validation. So, do not expect the threshold to match exactly with cross validation. To get it to match as I have explained here, you should run a single model without cross validation (split validation should be fine) and then the classification matrix should indeed correspond to a 0.5 threshold on the blue line in the ROC chart.

    Here's an example from one of my tests:

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    The ROC is not looking at the "one" threshold of 50 % at all, just at the confidences. The steps in the confidences are the steps in the chart. 

    At each confidence level the false and true positive rate is calculated.

    Try a k-NN with a few neighbors (e. g. 3) and without distance weighting. This restricts the possible thresholds to only a few values. It is easier to calculate the values in the chart that way.

    Regards,

    Balázs
Sign In or Register to comment.