Precision-Recall Curves

amitd · July 2019

I am following up on a previous discussion on Precision-Recall Curves. I cannot seem to edit or reply to it, so am creating a new thread with the same Subject.

I have attached the test process as XML which uses the Ripley dataset for example purpose. In a nutshell, I have optimized the threshold based on f-measure and then output the model performance for that "best threshold". Also, I have used the "ROC Curve to Example" operator from the Converters extension to generate 500 datapoints as an ExampleSet. I notice some discrepancy in the results which is intriguing. If someone could help reconcile it, it would be much appreciated.

The best threshold is found out to be 0.642, which corresponds to a confusion matrix with TP=112, TN=104, FN=13, FP=21 with recall=TPR=89.6%, specificity=83.2%, FPR=1-specificity=16.8%, and F-measure=86.82%
Now, the "ROC Curve to ExampleSet" output lists the different FPRs, TPRs, and confidence thresholds. I find that for TPR=89.6% and FPR=16.8%, the confidence (threshold) is listed as 0.356 which is puzzling. Also, for the confidence (threshold) of 0.643, TPR=66.4% and FPR=4.2%.
Why do the values from steps 1 and 2 not match up?

amitd · July 2019

I think I figured it out. I had swapped the "first class" and "second class" parameters in the Create Threshold operator nested in the Optimize Parameters (Grid) operator. It is counterintuitive, but apparently "first class" has to be set to be the negative class (in this example 0) and "second class" has to be set to the positive class (in this example 1).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Precision-Recall Curves

Best Answer