RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Precision-Recall Curves

amitdeokaramitdeokar Member, University Professor Posts: 20  Maven

It's great that we have the AUPRC value generated through the Operator Toolbox Extension. What would be much more useful is the Precision-Recall curves for a classifier (for any given threshold or cutoff value), especially when the dataset has a significant skew for the class labels. See the linked description about this, borrowed from the "Introduction to Data Mining" (2nd edition) by Tan et al. The intent is show the resultant PR-curve: PR-curve link (part 1)PR-curve link (part 2)

DocMusher
1
1 votes

Declined · Last Updated

Workaround created (see below)

Comments

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,403  RM Data Scientist

    Hi @amitdeokar,

     

    thanks for the using the operator i've written. I see the point, and we should propably sit down and write it. In the meanwhile i think you can just use ROC to Example Set of converters. This gives you TPR and FPR for n-data points. TPR is already recall and you can go from there to calculate precision.

     

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • amitdeokaramitdeokar Member, University Professor Posts: 20  Maven

    Great. Can you elaborate a bit on your suggestion about "use ROC to Example Set of converters".  I want to give this a try but would like to know a bit more on the specifics. I tried storing it in the repository, but it stores the ROC curve itself, but not the underlying data. Thanks.

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,403  RM Data Scientist

    Hi @amitdeokar,

    i thought you wanted to have the underlying ROC (or PR) data? Am i mistaken?

     

    BR,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • amitdeokaramitdeokar Member, University Professor Posts: 20  Maven

    Correct, your suggestion is about leveraging the numeric data that underlies the ROC curve (which is basically the TPR and FPR for all thresholds). However, I am not able to directly view this data itself, but only the plotted curves. 

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,403  RM Data Scientist

    Hi @amitdeokar,

    thats what the operator "ROC to Example Set" does. it gives you the underlying data as an example set.

     

    BR,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.