03-04-2014 05:49 PM

9 REPLIES

03-05-2014 08:08 AM

03-05-2014 08:08 AM

If you have strongly imbalanced data do not use a decision tree.

03-10-2014 09:11 PM

03-10-2014 09:11 PM

thanks for your reply. Because I know there are algorithm out there which solved imbalance problem (for decision tree), and I am not sure about the version that decision tree is using in rapidminer. Like C4.5 or something else?

03-11-2014 09:01 AM

03-11-2014 09:01 AM

Hi,

I am not sure which implementation the RapidMiner decision tree is using, I suppose something similar to C4.5. If you want to make sure to use C4.5 you can use W-J48 from the Weka Extension. That operator is a free implementation of C4.5.

03-11-2014 06:40 PM

06-23-2014 03:42 AM

06-23-2014 03:42 AM

I suppose based on the criterion you use in the parameter setting of decision tree operator ,the RM produces a different tree using a different algorithm like c4.5.

06-23-2014 04:16 AM

06-23-2014 07:34 AM

06-23-2014 07:39 AM

06-23-2014 07:39 AM

Well, as I said, it's similar to C4.5. In each node the split attribute is chosen by iterating all attributes, finding the best split for each attribute with respect to the splitting criterion, and then using the attribute that maximizes the chosen criterion.

For nominal attributes always one branch for each value is created. For numerical/date attributes always a binary split is performed. To find the best split value all possible values in the training data are tried.

The procedure is repeated until you have pure leaves or one of the pre-pruning conditions is met. Then optionally some post-pruning is applied.

06-23-2014 09:49 AM