Gradient Boosted Trees: extract feature importance?

MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
edited December 2018 in Product Feedback - Resolved

Hi all,

 

is it possible to extract the feature importance from the GradientBoosted model?

Most comfortable would be a weights output on the operator in one of the next releases, but I'm sure it must also be possible with some Groovy code?

 

Unfotunately the description result of the model shows only the ~10 most/least important features, which is not enough if you have many features.

 

Cheers,

Marius

8
8 votes

Fixed and Released · Last Updated

Comments

  • zprekopcsakzprekopcsak RapidMiner Certified Expert, Member Posts: 47 Guru

    Hi Marius,

    Good point, this is in fact something we are considering for one of our upcoming releases. I just raised the priority in our tracking system so hopefully it will make it into a release very soon.

    Best, Zoltan

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    That's good news, thanks! Hope "one of next" means soon :)

     

    Cheers,

    Marius

  • phellingerphellinger Employee, Member Posts: 103 RM Engineering

    Hi Marius,

     

    this one is already part of the upcoming 7.3 release.

    Logistic Regression, Gradient Boosted Trees and Generalized Linear Model all provide an attribute weights vector output.

     

    (Extracting those with Groovy scripts is not possible due to security restrictions, if nothing else.)

     

    Best,

    Peter

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @phellinger and @zprekopcsak, perhaps a more generalized operator to extract attribute importance weights from any model would be even more helpful?  I know that there is not necessarily a single definition of how to determine attribute importance inside a multivariate model, but in theory, one simple approach is to take the list of all model attributes and remove them one at a time from the final model to see the resulting deterioration in model performance, and rank them accordingly (where the attribute that leads to the greatest decrease in performance has the highest weight, and all other attributes' weights are scaled to that).  This can of course be done manually today (and even done with loops to cut down on repetitive operations), but it would be nice if RapidMiner added an operator to do this automatically for any model and output the resulting table as a set of weights.  In my view this would answer a very common question from business users about attribute/variable importance in multivariate models.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • zprekopcsakzprekopcsak RapidMiner Certified Expert, Member Posts: 47 Guru

    @Telcontar120: sounds like an interesting idea, but might be misleading. If you have two highly correlated attributes then    removing one will not change performance at all, even though one of them may be needed for a good model.

    Otherwise, I agree that explaining and adding narrative to a model is very important and we are considering various ways to improve there.

Sign In or Register to comment.