Precision-Recall Curves

amitdamitd Member, University Professor Posts: 49 Maven

It's great that we have the AUPRC value generated through the Operator Toolbox Extension. What would be much more useful is the Precision-Recall curves for a classifier (for any given threshold or cutoff value), especially when the dataset has a significant skew for the class labels. See the linked description about this, borrowed from the "Introduction to Data Mining" (2nd edition) by Tan et al. The intent is show the resultant PR-curve: PR-curve link (part 1)PR-curve link (part 2)

1
1 votes

Declined · Last Updated

Workaround created (see below)

Comments

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi @amitdeokar,

     

    thanks for the using the operator i've written. I see the point, and we should propably sit down and write it. In the meanwhile i think you can just use ROC to Example Set of converters. This gives you TPR and FPR for n-data points. TPR is already recall and you can go from there to calculate precision.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • amitdamitd Member, University Professor Posts: 49 Maven

    Great. Can you elaborate a bit on your suggestion about "use ROC to Example Set of converters".  I want to give this a try but would like to know a bit more on the specifics. I tried storing it in the repository, but it stores the ROC curve itself, but not the underlying data. Thanks.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi @amitdeokar,

    i thought you wanted to have the underlying ROC (or PR) data? Am i mistaken?

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • amitdamitd Member, University Professor Posts: 49 Maven

    Correct, your suggestion is about leveraging the numeric data that underlies the ROC curve (which is basically the TPR and FPR for all thresholds). However, I am not able to directly view this data itself, but only the plotted curves. 

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi @amitdeokar,

    thats what the operator "ROC to Example Set" does. it gives you the underlying data as an example set.

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.