Options

Set Cutoff for Classification using Logistic Regression

btibertbtibert Member, University Professor Posts: 146 Guru
edited September 2019 in Help
Is it possible to manually set the threshold for the cutoff predict the label using a logistic regression?  I read that the cutoff is .5, which I get, but my dataset is heavily imbalanced and I would like to set this by hand.  There appears to be an automated way to do this, but for the sake of teaching the concept of the cutoff, I would prefer to show this manually.

Thanks!

Best Answer

  • Options
    arjun_gopalarjun_gopal Member Posts: 7 Contributor II
    Solution Accepted
    Hi,
    "Create Threshold" and "Apply Threshold" should do the trick for you. 



Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @btibert

    The operators "create threshold" and "apply threshold" does this. Please inform if this is what you are looking for.

    Hope this helps
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hi,
    i think what @btibert relates to is Platt Scaling. The operator Rescale Confidences (Logistic) is i think what he looks for. You can combine this with Thresholds afterwards.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @mschmitz oops my bad, I totally missed the Logistic Regression.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    btibertbtibert Member, University Professor Posts: 146 Guru
    I had seen those operators @gopala but had a hard time wrapping my head around the construction, thanks for the screenshot as that is now intuitive.  Thanks!
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You can also use Drop Uncertain Predictions operator if you want to treat ambivalent cases as excluded rather than forcing them into one category or another simply by lowering (or raising) the threshold.  This is often another helpful way of dealing with the issue because it allows you to recalculate the performance metrics without the excluded cases.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    btibertbtibert Member, University Professor Posts: 146 Guru
    Thanks @Telcontar120 , I will keep that in mind as well, but because this will likely be the first time my students have really sunk their teeth into logistic regression, the cutoff discussion, and modifying it manually is perfect for them to understand the construction before using tools that optimize it for them.  
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    @btibert sure, that makes sense, but just for clarity, Drop Uncertain Predictions doesn't automatically optimize anything.  It simply excludes predictions below a certain confidence level that is set manually.  It is conceptually the same as Create Threshold, only Create Threshold says "use all data but don't change my prediction until the confidence is above 70%" and Drop Uncertain Predictions says "only keep predictions that are above 70% confidence."
    If you take a look at the tutorial process it should make the outcome a bit clearer.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    btibertbtibert Member, University Professor Posts: 146 Guru
    edited September 2019
    will do @Telcontar120 , many thanks for the follow-up note

Sign In or Register to comment.