Hello, i want to ask regarding Gradient Boosted model that i used for my study on corporate default risk. My dependent variable is default and non default and i use number 1 as default and 0 as non default. I already setup the data type as binominal for the default and non default. After i call the related operators such as select attributes, set roles and cross validation, all tree at the end of the result don't show the branches either it will become 1 or 0 as i assigned before. Below i share one of the Gradient Booted models
Figure 1: Tree for Gradient Boosted ModelThe RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
Gradient Boosted Tree don't show the dependent variable in the resulting trees
However, i tried on other models such as decision tree and random forest. It give the desired result. Below is attachment of decision tree and random forest model
Figure 2: Tree for Decision Tree
Figure 3: Tree for Random Forest
So, i want to ask what is the reason that the 0 and 1 that i assigned before dont show up at the end of the branches in Gradient Boosted model but show up on other models? and Is there any operator that i need to call in the process? I hope that anyone can help me to find any way to solve this problem. Everyone is open to give help. Thank you in advance.
Tagged:
0
Answers
I don't think you are going to get what you are looking for with gradient boosting. These models are more difficult to interpret and I don't think that any data preparation you do before modeling will allow for that sort of output. With the decision tree or the random forest, each tree makes an independent prediction for the label. With a gradient boosting for a categorical prediction, it's more like an effect on the log-odds of the positive class, that can't easily be interpreted on its own. If you have several categories, it will create different trees with different positive classes. Sometimes in situations like this, people will use the GBT for its predictive power, but compare the performances and results with other models, and rely more on those other models for interpretability.
Best,
Jeff