RapidMiner's Decision Tree tries all possible split values of a numeric attribute and selects the value which produces the best split with respect to the selected criterion.
It uses all values that are the data set right? Probably, an implementation that uses sorting, can be even faster. Because you know an optimal split point is always halfway between to data points.
For example, assume you have a numerical axis from left to right (attribute x) and labels A en B (class attribute).
AAAAAAAAAAA|BBBBBBBBBBBBBBBBBBB ----------------------|-------------------------------> v x-axis optimal split point
As far as I'm aware you can not create some picture where the optimal split point is not half way in between.
ABABABABABABABABABAB ------------------------------------> x-axis There is no optimal split point here? Splitting on x here provides 0 information gain.
AAABBA ------|-----------> v x-axis optimal split point
I hope you get the idea.
By the way, you can actually create data sets like this and see where it splits!
Answers
RapidMiner's Decision Tree tries all possible split values of a numeric attribute and selects the value which produces the best split with respect to the selected criterion.
Best,
Marius
Probably, an implementation that uses sorting, can be even faster.
Because you know an optimal split point is always halfway between to data points.
For example, assume you have a numerical axis from left to right (attribute x) and labels A en B (class attribute).
AAAAAAAAAAA|BBBBBBBBBBBBBBBBBBB
----------------------|------------------------------->
v x-axis
optimal split point
As far as I'm aware you can not create some picture where the optimal split point is not half way in between.
ABABABABABABABABABAB
------------------------------------>
x-axis
There is no optimal split point here? Splitting on x here provides 0 information gain.
AAABBA
------|----------->
v x-axis
optimal split point
I hope you get the idea.
By the way, you can actually create data sets like this and see where it splits!