Set Cutoff for Classification using Logistic Regression

btibertbtibert Member, University Professor Posts: 68  Maven
edited September 20 in Help
Is it possible to manually set the threshold for the cutoff predict the label using a logistic regression?  I read that the cutoff is .5, which I get, but my dataset is heavily imbalanced and I would like to set this by hand.  There appears to be an automated way to do this, but for the sake of teaching the concept of the cutoff, I would prefer to show this manually.

Thanks!

Best Answer

  • arjun_gopalarjun_gopal Posts: 7 Contributor II
    Solution Accepted
    Hi,
    "Create Threshold" and "Apply Threshold" should do the trick for you. 



Answers

  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Hello @btibert

    The operators "create threshold" and "apply threshold" does this. Please inform if this is what you are looking for.

    Hope this helps
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,188  RM Data Scientist
    Hi,
    i think what @btibert relates to is Platt Scaling. The operator Rescale Confidences (Logistic) is i think what he looks for. You can combine this with Thresholds afterwards.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    @mschmitz oops my bad, I totally missed the Logistic Regression.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • btibertbtibert Member, University Professor Posts: 68  Maven
    I had seen those operators @gopala but had a hard time wrapping my head around the construction, thanks for the screenshot as that is now intuitive.  Thanks!
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,270   Unicorn
    You can also use Drop Uncertain Predictions operator if you want to treat ambivalent cases as excluded rather than forcing them into one category or another simply by lowering (or raising) the threshold.  This is often another helpful way of dealing with the issue because it allows you to recalculate the performance metrics without the excluded cases.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • btibertbtibert Member, University Professor Posts: 68  Maven
    Thanks @Telcontar120 , I will keep that in mind as well, but because this will likely be the first time my students have really sunk their teeth into logistic regression, the cutoff discussion, and modifying it manually is perfect for them to understand the construction before using tools that optimize it for them.  
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,270   Unicorn
    @btibert sure, that makes sense, but just for clarity, Drop Uncertain Predictions doesn't automatically optimize anything.  It simply excludes predictions below a certain confidence level that is set manually.  It is conceptually the same as Create Threshold, only Create Threshold says "use all data but don't change my prediction until the confidence is above 70%" and Drop Uncertain Predictions says "only keep predictions that are above 70% confidence."
    If you take a look at the tutorial process it should make the outcome a bit clearer.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    Tghadially
  • btibertbtibert Member, University Professor Posts: 68  Maven
    edited September 21
    will do @Telcontar120 , many thanks for the follow-up note

    Tghadially
Sign In or Register to comment.