Variable importance in deep learning and what to do with it?

fstarsinicfstarsinic Member Posts: 20 Contributor II
What does one learn from variable importance?
What might one change in the model, based on what they see?

If you see variables that seem "very important" at the top that you know are not important, does that mean it's a candidate for "attribute removal" or "weight reduction"  or...?

Example: I have a few category attributes that are hierarchical.   if the upper(est) parent category has high importance, does it really need to be there at all, if the lower categories are the ones that really tell the story?  Seems to me it's telling me i can get rid of that feature/attribute and that perhaps the model is relying too much on the upper level category to make predictions. 

Yes, i know I should try removing it to see what happens but in general I'm wondering how should variable importance be interpreted?



Answers

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Deep learning is said to be capable of its own feature engineering. This is true given enough network complexity, computational power and time. So, one way of improving the process of learning is to help in the feature selection - still. Deep learning can assist in this process by calculating variable importance as it learns and you can retrieve it after it finished training. However when you have a very large network it may significantly slow learning, so be careful. H2o suggests to use (distributed) random forest to establish variable importance instead and in RM you have several weighing operators and several feature engineering operators as well to help you do the same. 
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Also there is nothing better than a bit of experimentation and network tuning 😊
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Keep in mind that variable importance metrics for ML algorithms like this are not so easy or straightforward to interpret, because relationships between individual attributes and the prediction can be quite localized, meaning different magnitude and even direction depending on where you are in the input space. 
    So in general these variable importance measures are either based on heuristics or are generated empirically by selectively removing attributes and determining the proportional loss in predictive power.  @mschmitz might know the actual method being using "under the hood" for the native DL algorithm.
    In any case, I think the bottom line is that you always need to take them with a bit of a grain of salt and you may want to look at some of the other operators like "Explain Predictions" to explore what is happening for any given set of attribute values.
    And I agree with @jacobcybulski that it is always a good idea to play around with your input attributes manually a bit if they have clear relationships and you are trying to get a better understanding of what is going on (like in your example of a multi-level hierarchical attribute set).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.