HELP! - How can I determine the quality of attributes for specific predictions?

alal Member Posts: 1 Contributor I
edited November 2018 in Help

Hi everyone,

This is really baking my noodle: I'd like to understand how to determine the quality of a predictor in a built model. That is, is there an operator/method to determine the attribute(s) - and even values - that provide for the prediction output?

 

It's a binary classification problem; model accuracy and other performance measures are fine, but just adding a 'Weights By..' operator to the data doesn't seem legit. That and it doesn't provide values like a decision tree would for the splits. But that in itself is a problem: a GBT model with 20 trees means I can't distill the splits to something readable/manageable that can be applied to a business context.

 

Help!

 

Thanks all

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    This is a tricky question, because the definition of "a quality predictor" in a given model is actually not that easy to develop.

    As you mentioned, you can use the various "weight by" operators to get an independent, univariate view of the strength of association between individual attributes and the label.

    Some algorithms actually output as part of their model description information on the "variable importance" or "variable weight" within that model.  However you sometimes have to read through the java code to determine exactly how that is calculated.  In the case of GBT, such information is available.  Have you reviewed it?  It probably provides something along the lines of what you are interested in.

    I think there is also a new project that is being worked on to bring a more general framework of variable importance into RapidMiner via the LIME project, you can read more about that here: https://github.com/marcotcr/lime

    @mschmitz any update on the LIME project in RapidMiner?

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    a initial version of Get Local Interpretation (aka Martin's adaption of LIME) is available in operator toolbox since last week.

     

    Writing a blog post on it is on my queque of things to do :/


    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.