How does RM Decision Tree algorithm choose between two attributes that produce equally good splits?

rick_davies · May 2017

It seems possible that a DT algorithm will find cases that are split equally well into two different groups (branches) by say two different case attributes. How does it choose one attribute to use, in these circumstances? And what can we say about the likely structure of the tree that would have been made had the neglected attribute been used? I have not seen anything like this discussed in the literature on DTs

IngoRM · May 2017

Well you could do this. However, I just do not see a chance for a massive improvement here to be honest. See my answer from before - I don't think that there really is a huge risk given that the whole tree is built based on choices which might be sub-optimal...

Marco_Boeck · May 2017

Hi,

looking at the code, I can say that in such a case, we simply use whichever attribute comes first in the data.

Regards,

Marco

rick_davies · May 2017

...which is what some others have told me.

In which case, how do we know that the other attribute, if it was used instead, does not lead on to a better performing DT structure?

Thomas_Ott · May 2017

You could re-order the attributes and test it?

IngoRM · May 2017

Interesting topic indeed. We do not know if the other attribute would not lead to a better DT. But having said this, this is actually true for all of the splits. Keep in mind that picking the one with the biggest information gain is a heuristic as well. There is no guarantee that a pick is leading to an optimal DT (optimal = best predictive power). So it could easily be that making a different pick at each split, let's say the one with the 2nd or 3rd best information gain, might lead to a more accurate tree. But the only way to find this out is to go bruteforce through all options which is not computationally feasible. And even then you might overfit to your test set you use to find out which split is the best.

So to summarize: using information gain (and the other approaches) is a heuristic itself and does not guarantee an optimal tree to begin with. Making an arbitrary split between two good attributes does not make this heuristic much more, well, "heuristic-y" in my opinion ;-)

Cheers,

Ingo

rick_davies · May 2017

I could do this, where I know that two attributes have been equal contenders

But how do I find these in the first instance? Or otherwise be confident they dont exist?

Perhaps for binary data, some sort of Hamming distance measure of similarity between all attributes could point to where there is such a risk?

rick_davies · May 2017

Would some sort of "prior to DT analysis" feature selection process that maximised diversity among the attributes be a best bet against such a risk?

rick_davies · October 2017

If this is the case then BigML should signal to the users when two or more attributes perform equally well, so they can explore what happens when one of these is deleted from the data set

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How does RM Decision Tree algorithm choose between two attributes that produce equally good splits?

Best Answer

Answers