RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


Precision-Recall Curves

amitdeokaramitdeokar Member, University Professor Posts: 20  Maven
I am following up on a previous discussion on Precision-Recall Curves. I cannot seem to edit or reply to it, so am creating a new thread with the same Subject.

I have attached the test process as XML which uses the Ripley dataset for example purpose. In a nutshell, I have optimized the threshold based on f-measure and then output the model performance for that "best threshold". Also, I have used the "ROC Curve to Example" operator from the Converters extension to generate 500 datapoints as an ExampleSet. I notice some discrepancy in the results which is intriguing. If someone could help reconcile it, it would be much appreciated.
  1. The best threshold is found out to be 0.642, which corresponds to a confusion matrix with TP=112, TN=104, FN=13, FP=21 with recall=TPR=89.6%, specificity=83.2%, FPR=1-specificity=16.8%, and F-measure=86.82%
  2. Now, the "ROC Curve to ExampleSet" output lists the different FPRs, TPRs, and confidence thresholds. I find that for TPR=89.6% and FPR=16.8%, the confidence (threshold) is listed as 0.356 which is puzzling. Also, for the confidence (threshold) of 0.643, TPR=66.4% and FPR=4.2%.Β 
  3. Why do the values from steps 1 and 2 not match up?

Best Answer

  • amitdeokaramitdeokar Posts: 20  Maven
    Solution Accepted
    I think I figured it out. I had swapped the "first class" and "second class" parameters in the Create Threshold operator nested in the Optimize Parameters (Grid) operator. It is counterintuitive, but apparently "first class" has to be set to be the negative class (in this example 0) and "second class" has to be set to the positive class (in this example 1).
Sign In or Register to comment.