How can we see the threshold chosen by the auto model classification model for final confusion mtx

unmunm Member Posts: 2 Learner I
edited June 2019 in Help
The auto model we created uses GBTree and produces a confusion matrix. We would like to see what threshold it had used for creating this matrix. Is there a way to view the threshold used?

Best Answers

Answers

  • unmunm Member Posts: 2 Learner I
    Thanks @kypexin and @Telcontar120. Really appreciate your time answering this. Yes, we guessed so (As 0.5 as the threshold) but wanted to confirm it to see if its doing anything more intelligently. That answers the question! 
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    We actually have been discussing this a bit.  It is hard to do this in a really intelligent way for the reasons @kypexin has been mentioning.  Without knowing the business context, one value is almost as good as any other :-)

    However, there are three ways from here to potentially improve this a bit:
    1. Offer a full-blown cost matrix based approach for Auto Model and perform a threshold optimization for optimizing profits / costs
    2. Optimize thresholds in a way that Accuracy (or F-Measure or...) is maximized
    3. Do nothing and leave it as it is
    I personally do not like No 1 since it would take away some of the simplicity of AM in the early prototyping phase. But I see the benefits of course and could imagine to make this optional.

    No 2 is at least avoiding problems with strongly imbalanced data sets and is what many internal people here at RM would love to see for AM.

    No 3 is very efficient in terms of resources :smile:

    I appreciate any opinion here (including additional ideas).  We may be able to improve this for one of the future releases if we have a good plan which is widely preferred.

    Thanks,
    Ingo
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Personally I think option #3 has the virtue of simplicity as well as efficiency---and thus is a good choice for automodel.  Many users of automodel might not understand the nuances of threshhold selection and modification and I fear that if you incorporate that automatically into automodel (such as option #2) then that could lead to additional confusion and misunderstanding later.  So my vote would be to keep option #3.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.