RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
"Problem with decision tree algorithm"
I tried to run the Decision tree algorith in Raipd Miner and it seems not to provide a correct result. I am not sure if the problem is caused by the implementation of the algorith or there is another reason for that. Below is the exercise that I tried to run with RM.
I use the following data (A and B are nominal, binary attributes and there are two classes: + and-):
I want to build a decision tree using Ginin index as the criterion for splitting. Rapid Miner selects attribute A as the best one for splitting. However, if I make calculations manually, B seems to be better. Do you know where is the difference from? Below are my calculations:
The overall gini before splitting is:
Gorig = 1− 0.42 − 0.62 = 0.48
The gain in gini after splitting on A is:
GA=T = 1−(4/7)2 −(3/7)2 = 0.4898
GA=F = 0
Δ = Gorig − 7/10 GA=T − 3/10 GA=F = 0.1371
The gain in gini after splitting on B is:
GB=T = 1−(1/4)2− (3/4)2 = 0.3750
GB=F= 1 - (1/6)2 − (5/6)2 = 0.2778
Δ = Gorig − 4/10 GB=T − 6/10 GB=F = 0.1633
Therefore, attribute B should be chosen to split the node (and not A as calculated by RM).