The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Important factors for prediction": how do you work?
up201712146
Member Posts: 1 Learner I
Hello,
I'm using Random Forest and Boosted Trees from AutoModel to prioritize the variables I'll use in modeling with neuralnetworks. So, for me, it's very important. So, for me, it is essential to know the "importance" of each dependent variable. As a result, AutoModel provides "Important factors for prediction", but I don't no how its works. I think is based in correlation but, in this case, should be independent of the type of modeling, but for Random Forest and Boosted Trees different results are generated. And more, before and after optimization, different results are generated to.
My question is: how is the importance of factors calculated?
Thank you.
I'm using Random Forest and Boosted Trees from AutoModel to prioritize the variables I'll use in modeling with neuralnetworks. So, for me, it's very important. So, for me, it is essential to know the "importance" of each dependent variable. As a result, AutoModel provides "Important factors for prediction", but I don't no how its works. I think is based in correlation but, in this case, should be independent of the type of modeling, but for Random Forest and Boosted Trees different results are generated. And more, before and after optimization, different results are generated to.
My question is: how is the importance of factors calculated?
Thank you.
Tagged:
0
Answers
the variable importance is calculated by each model in its own way. For example, the Random Forest has trees, which contain an attribute or don't, on different positions inside the tree. A variable with good predictive power will end up in more trees in a more prominent position.
A linear regression model would look at the standardized coefficients etc.
There are "Weight by ..." operators that can give you variable importances based on correlation, information gain, gain ratio etc. These might be similar to the weights you're getting from your models but they're not the same.
Regards,
Balázs
As @BalazsBarany said, the various "Weight by" operators (e.g., Weight by Correlation, Weight by Information Value) are good for finding the univariate strength of relationships between individual attributes and your label. However, that does not mean those are the most important in a multivariate model context because of the potential overlap of information (e.g., multicollinearity in the linear regression context). Likewise, the "variable importance" measures that are provided by individual machine learning operators do not necessarily show you the attributes with the strongest individual relationships with the label, but rather those in the context of that specific model with the other attributes that are available. This is an important distinction to keep in mind.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts