Explain Predictions: Ranking attributes that supports and contradicts correct predictions

varunm1varunm1 Member Posts: 285   Unicorn
edited April 10 in Knowledge Base
Hello,

Most of the feature selection techniques will provide us with the best predictors that support predicting target label. These are mainly dependent on the correlation between the predictor and output label(class). 

A limitation of this process is the importance of attributes changes from one model to another model. This mainly depends on the variations in the strength of attribute in the presence of other attributes and also based on model statistical background.

How can we know which of these variables performed better in predicting a correct label for a particular algorithm? 
In RapidMiner, there is an "explain predictions" operator that provides statistical and visual observations to help understand the role of each attribute on prediction. This operator uses local correlation values to specify each attribute (Predictor) role in predicting a particular value related to a single sample in the data. This role can be supporting or contradicting the prediction. These were visualized beautifully with different color variations in red and green. Red color represents attributes that are contradicting prediction, and green color represents attributes that support the prediction. 

How to know which attributes supported and contradicted correct predictions and vice versa?
As explained earlier the color codes you see in visualization belongs to both correct and incorrect prediction. What if you are interested in finding attributes that support and contradicts correct prediction? This is the motive behind writing this post. In predictive modeling, only a few models can provide global importance of variables. Finding attributes of global significance is difficult in the case of complex algorithms. But, with the help of "explain prediction" operator, we can generate rankings for predictions that supported and contradicted predictions. I will explain this in a process example below.

The process file attached below is based on IRIS dataset. The problem we are looking here is related to the classification of different flowers based on four attributes (a1 to a4). I try to find attribute importance using Auto model. An auto model provides important attributes based on four factors (https://docs.rapidminer.com/8.1/studio/auto-model/).  Now, I first observed the importance of attributes in the auto model and found that a2 is the best predictor as you can see in below figure its represented in green. The other three attributes are in yellow, and this means that they have a medium impact on model predictions. To test this, I run the models (5 fold cross validation) with these three attributes included and removed.



Interestingly, the models did very well in the presence of all four attributes compared to their absence. The kappa values increased from 0.3 to 0.9. So, it means that for this dataset we better include all the four attributes. Now, the next task is trying to understand which attributes did well in predicting the correct label. For this, We utilize the explain predictions operator as well as some regular operators to rank the performance ( provided this ranking method).


I compare four classification models (Decision Tree, Randon Forest, Gradient Boosted Tree & Neural Network) performance and identify the importance of attributes in each model for correct predictions. From the below figure, you can observe that the importance of each attribute varied according to the algorithm. The positive value indicates supporting attributes and negative indicates contradicting attributes related to correct predictions. These attributes were sorted based on their importance.

 
Now to observe the effect of having only supporting attributes I removed attributes that were identified above to contradict correct predictions and run the models again. From the results, I observed that the Decision Tree and Gradient boosted tree performance improved. There is no difference in Random forest performance but neural net performance reduced. In machine learning, we try different crazy things as there are no set rules to get better predictions.

Comments and feedback are much appreciated.

Thanks
Regards,
Varun
BalazsBaranysgenzerAndyJ

Comments

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,560  RM Founder
    Hey Varun,
    Thanks for the discussion and the thoughts.  I would like to provide some comments on some of the aspects you have mentioned.
    Now, I first observed the importance of attributes in the auto model and found that a2 is the best predictor as you can see in below figure its represented in green. The other three attributes are in yellow, and this means that they have a medium impact on model predictions.
    That is actually not the meaning of those colors.  I have pasted the help section on the colors below as a spoiler.

    The colored status bubble provides a quality indicator for a data column.

    • Red: A red bubble indicates a column of poor quality, which in most cases you should remove from the data set. Red can indicate one of the following problems:
      • More than 70% of all values in this column are missing,
      • The column is practically an ID with (almost) as many different values as you have rows in your data set but does not look like a text column at the same time (see below),
      • The column is practically constant, with more than 90% of all values being the same (stable), or
      • The column has a correlation of lower than 0.0001% or higher than 95% with the label to predict (if a label is existing).
    • Yellow: A yellow bubble indicates a column which behaves like an ID but also looks like a text or which has either a very low or a very high correlation with the target column. They correlation-based yellow bubbles can only appear if the task is "Predict".
      • ID which looks like text: this column has a high ID-ness and would be marked as red but at the same time has a text-ness of more than 85%.
      • Low Correlation: a correlation of less than 0.01% indicates that this column is not likely to contribute to the predictions. While keeping such a column is not problematic, removing it may speed up the model building.
      • High Correlation: a correlation of more than 40% may be an indicator for information you don't have at prediction time. In that case, you should remove this column. Sometimes, however, the prediction problem is simple, and you will get a better model when the column is included. Only you can decide.
    So green does not mean it is most important, it simply means it is safe to use this feature for modeling.  Yellow on the other hand should be checked.  In this case, not because of low correlation but because of high correlation.

    A better piece of information to see likely importance of the feature on the label / for the model is the correlation column.  If you sort according to this column, you will see that the order with respect to importance (correlation with label) is a3, a4, a1, a2.  So in fact a2 - while safe to be used for modeling without additional check - is also in fact likely to be the least important feature.

    Interestingly, the models did very well in the presence of all four attributes compared to their absence. The kappa values increased from 0.3 to 0.9.
    I would recommend to re-do this analysis based on the information above.  So a3 and a4 are the most important ones, not a2.

    I compare four classification models (Decision Tree, Randon Forest, Gradient Boosted Tree & Neural Network) performance and identify the importance of attributes in each model for correct predictions.
    I am actually thinking about creating a new operator for calculating the feature importance based on the Explain Predictions output as well.  I am not sure yet if the focus only on correct predictions is actually a good idea or not.  I could see an argument to include both sides of the coin to be honest, correct and wrong ones.  The reason is that the feature value was important for the model independent of the question if the prediction was correct or not.  Is this not what we are after?

    Just my 2c,
    Ingo
    varunm1
  • varunm1varunm1 Member Posts: 285   Unicorn
    Thanks, @IngoRM for your comments. In my view, it is not appropriate to analyze only based on correct predictions as we need to take the total predictions that were classified as both correct and incorrect. I saw some trend where a highly supporting attribute for correct predictions is also a supporting attribute for incorrect predictions. If the performance of an algorithm is low (where a number of incorrect predictions are more), the attribute supporting correct predictions and incorrect predictions are doing more harm than good.

    Correct me if there is any misconception about this.
    Regards,
    Varun
    sgenzer
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,560  RM Founder
    I think we are on the same page here.  I was only bringing it up since you mentioned
    ...and identify the importance of attributes in each model for correct predictions.
    So I thought it is a good opportunity to have this discussion quickly :)

    varunm1sgenzer
  • varunm1varunm1 Member Posts: 285   Unicorn
    Thanks, I got it. I am looking forward to your idea on the new feature selection operator based on explaining predictions, actually, this is the major reason I posted this thread. :smile:
    Regards,
    Varun
    sgenzer
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,560  RM Founder
    Yip, this thread here and also another one in the recent days really made me think about how this could lead to a model-agnostic but model-dependent global feature importance weighting based on the local explanations / importances.  Stay tuned...
    varunm1sgenzerPavithra_RaoAndy2
Sign In or Register to comment.