Using Gradient Boosted Tree Output

kylejohnson · March 2019

Hello,

New User here. Sorry if this has already been asked but I can't find an answer anywhere. The simple version of the question is how do I use the output model of the GBT? What do the numbers on the leafs mean? Why are there 60 trees in the model and how are they all used together in application?

To give a little background that may or may not be helpful, I am a stock trader and have constructed an indicator for short term price movement. This indicator works excellent sometimes and is not useful at others. I am trying to determine if there are patterns that can give me a better idea of when the indicator will work and when it wont. My attributes are all numerical values that are part of the indicator and the label is "yes" if that particular prediction of stock movement was useful. My ultimate goal is to use RapidMiner to find a way to figure out when to listen to my indicator and when not to and then to put that insight back into the trading indicator itself.

Thank you in advance for your time and insight,

Kyle

Telcontar120 · March 2019

GBT is an ensemble method so there are multiple trees by design--in fact it is a parameter setting so you can control the number of trees. No single tree is really useful or interpretable in this context. The entire set of trees must be used to make the prediction.
GBT is not a method that is suitable for simple explanations. If you want that then you can try the simpler Decision Tree operator, but you may see a significant deterioration in performance. Instead, if you want to use GBT then you will need to score future records in RapidMiner using that model and then relying on the prediction. These algorithms are somewhat "black box" in their nature.

varunm1 · March 2019

Hello @kylejohnson

As mentioned by Telcontar120, there will be multiple tree build one after other based on the parameters set in the operator.

Working:

First the operator builds one decision tree and it can have multiple leaf nodes with a certain value. Each leaf node will calculate how far it is from the original values, this will be taken as an error. The next tree weights were adjusted in such a way that this error is minimized.

The outputs of one tree are not used as input to other, but the error from one tree is taken into consideration while initializing weights of the next tree so that it is built with less deviation from the original value. There is a simple video which explains this.

https://www.youtube.com/watch?v=ErDgauqnTHk

Hope this helps you get an understanding. @Telcontar120 correct me if there is any misconception

kylejohnson · March 2019

Telcontar,

Thank you that makes more sense. Do I have the correct basic understanding (I apologize for the incorrect terminology):

When a new example is run through the model, it is put into "Tree 1" which gives it an output value "Leaf 1", then into "Tree 2" and given another output value "Leaf 2", until "Tree N" and "Leaf N". Then are all of the "Leaf Values" added up? How does the model arrive at a final output?

Again thank you in advance,

Kyle

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Using Gradient Boosted Tree Output

Best Answers

Be Safe. Follow precautions and Maintain Social Distancing

Answers