what is the logic regarding the impact of weights on prediction?

MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
edited March 2020 in Help

Hi!
I am working on a project in which I need to predict the value of a variable based on others that in the set are part of a database, which, in turn, I am using as input in the program. For this, the tool or method that I'm using in Rapidminer is Automodel. Everything good when running the model. The algorithm that came out as the best was Gradient Boosted Trees, so I focused on that one. Once there, in the tab "Weights" certain variables (let's call them "a", "b" and "c") came out as the most influential or of major importance. Then I went to the tab "Simulator" in order to see how these variables affect the value of my target variable (suppose "y"). However, the value remains intact. I tried modifying the values of the variables that were less influential to see if any had an impact on "y". While doing this test, I came across two variables ("m" and "n") that did change the value of "y" but what seemed strange to me was that neither of them was as influential as "a", "b "or" c ". Another thing that I observed and found curious was that, in the tab "Production Model", most of the trees presented these two variables "m" and "n" as headers, but I don't know what I can conclude from it. Please, I would like someone to explain to me why this happens or what the real logic regarding the impact of weights on prediction is,  and why certain variables that are hardly influential at all do cause an impact. I hope you can help me. Thanks in advance.

Regards,

Miguel Hinostroza
Tagged:

Best Answer

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @MiguelHH98

    The weights that you are seeing in automodel are related to "local prediction weights". These are different from global model weights. The weights are calculated by an operator called "Explain Predictions". This operator works based on Locally Interpretable model explanations (LIME) method (modified version). 

    These weights are used to explain, which attributes are important locally for individual predictions instead of a global scale. I will explain you with an example process.

    I created a process where I use split validation (70:30 , train:test) on deals data present in the community samples.

    If you run the attached process you will get two windows in results screen, one related to the model and another related to weights. So if you take a close look at the "Description" , You can see that Age has huge importance compared to the other two attributes. The way these are calculated are based on training of model (training data). These are global variable importances.



    On the other hand, if you see the Attribute weights calculated by "Explain Predictions" operator, you can observe that "Age" has less weight. 



    Why is this? The reason is, the methods used and their interpretations are different. Explain predictions will calculate weights based on "Predictions" (Testing Data). This explains which attributes are important in predictions rather than a global model as we saw in the previous case.

    Why do we need this? Every model cannot provide you with global importance and it is tough to explain the importance of attributes for predictions based on global importance as this is much more complex. You will find cases where you need to explain each prediction (E.g.: Medical Diagnosis). Remember, this prediction might be right or wrong. So, rapidminer comes up with a method that takes a Correlation-based LIME method to calculates attribute weights based on correct and wrong predictions.

    @Joanneyu I guess this explanation might help you as well.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
    Thanks for replying @varunm1.

    I think now I understand better the difference between these two types of weights. But I still don´t get why the predicted value doesn't change when I modify the values of the attributes wich have the highest wights (for prediction). 

    I mean, this atributte has the highest wight:

    but it doesn't cause an impact on the prediction:

    And these attributes, which have lower weights...

    do affect the prediction, for example this:
    I don't understand why it happens, or maybe I'm getting something wrong. I hope you can help me. Thanks!

    Regards,

    MH
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @MiguelHH98

    Can you open the process and run it to check how the global attribute importance of a GBT model is? To do this, you just need to click on the Open process in auto model and then run the process. You will get multiple windows in results, there you need to check GradientBoosted(Model PO) tab and then go to description, you will find variable importances of GBT.

    If you want me to check and confirm, I need your data to do that. 

    Some times the global importance doesn't match local attribute weights which might cause these sorts of behaviors. I am also attaching @IngoRM if he could add something here.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
    Hi @varunm1

    Sorry, I´d like you to check the data by yourself but I'm not allowed to share this information. Nevertheless, I could do what you asked me to, and this makes more sense:
    The first three attributes cause a reaction on the prediction. So I can conclude that I have to pay attention to this weights instead of the AttributeWeights when I want to test the sensitivity of the prediction. Please correct me if I'm wrong. 

    My goal in this project is to figured out the combination of values of all these attributes that maximizes the prediction's value. That´s why I'm using the simulator. Also, I was using the Attributes weights to explain how much each attribute affects the prediction, but now I'm not sure if these weights are usefull in this case, and, If they were, how could they help?

    I remain attentive to your answer. Thanks!

    Regards,

    MH 
  • MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
    Hi @IngoRM, thanks for clarifying these points.

    Just to confirm:

    the weights I showed you in my last comment correspond to number 3? 

    And do these weights correspond to number 2?:
     
    I await your comments

    Regards,

    MH
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    the weights I showed you in my last comment correspond to number 3? Yes

    And do these weights correspond to number 2?: May or may not, as these are model-based weights and sometimes differ from global weights calculated in 2. The reason is related to test data used in the calculation of global weight in method 2. The variations in the test set influence method 2.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
    Oh sorry, I think I wasn´t so clear. I meant if number 2 corresponds to the weights shown in the image below: 

    Please, also I hope you may answer this question: If I wanted to test the effects of the attributes on the prediction´s value, on the simulator tab, should I take into account the weights of number 2? Because as I showed you, in this case specifically, it seems that they don't have nothing to do. Unlike the weights of numbers 3 and 4 which make more sense. Thanks in advance.

    Regards,

    MH
  • MiguelHH98MiguelHH98 Member Posts: 11 Contributor I
    Thank you @varunm1

    One question more: Is there any way to know why the software determinates those attributes as the most important (according to number 3)? Thanks.

    Regards,
    MH


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @MiguelHH98

    The GBT algorithm in automodel is H2O based. You can look in the below link to understand how variable importance are calculated for tree based algorithms.

    http://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html


    Hope this helps. 
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.