Gradient Boosted Tree don't show the dependent variable in the resulting trees

sbrnae · July 2023

Hello, i want to ask regarding Gradient Boosted model that i used for my study on corporate default risk. My dependent variable is default and non default and i use number 1 as default and 0 as non default. I already setup the data type as binominal for the default and non default. After i call the related operators such as select attributes, set roles and cross validation, all tree at the end of the result don't show the branches either it will become 1 or 0 as i assigned before. Below i share one of the Gradient Booted models

Image: https://us.v-cdn.net/6030995/uploads/editor/td/ndqkrd6b7md8.png

Figure 1: Tree for Gradient Boosted Model

However, i tried on other models such as decision tree and random forest. It give the desired result. Below is attachment of decision tree and random forest model

Image: https://us.v-cdn.net/6030995/uploads/editor/2k/f9h44r08p7kh.png

Figure 2: Tree for Decision Tree

Figure 3: Tree for Random Forest

So, i want to ask what is the reason that the 0 and 1 that i assigned before dont show up at the end of the branches in Gradient Boosted model but show up on other models? and Is there any operator that i need to call in the process? I hope that anyone can help me to find any way to solve this problem. Everyone is open to give help. Thank you in advance.

jmergler · July 2023

Hi @sbrnae
I don't think you are going to get what you are looking for with gradient boosting. These models are more difficult to interpret and I don't think that any data preparation you do before modeling will allow for that sort of output. With the decision tree or the random forest, each tree makes an independent prediction for the label. With a gradient boosting for a categorical prediction, it's more like an effect on the log-odds of the positive class, that can't easily be interpreted on its own. If you have several categories, it will create different trees with different positive classes. Sometimes in situations like this, people will use the GBT for its predictive power, but compare the performances and results with other models, and rely more on those other models for interpretability.
Best,
Jeff

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Gradient Boosted Tree don't show the dependent variable in the resulting trees

Answers