"comparing decision trees"

karenkaren Member Posts: 6 Contributor II
edited May 2019 in Help
Hi! I'm generating decision trees varying input parameters:  criterium, maximal depth and confidence. For comparing the different decision trees obtained i'm mostly considering:
-Class Frequencies, because i'm interested in the most frequently classes obtained.

But Rapid Miner offers  for example 4 criteria ( gain_ratio, information_gain, gini_index, accuracy) for Decision Tree (i'm  not working with Multy way decision tree, or with weight based, ID3 or CHAID). Each of them generates trees with different class frequencies, accuracy, recall and precision.

I was wondering if there is some kind of framework for comparing this trees. For example i can obtain classes with high frequencies but not so high Precision, or lower frequencies and higher accuracy, how can these trees be compared? 

Regarding the usability of the obtained results:
If following a branch of the tree i get some frequency but following another i get a slightly higher frequency involving more variables maybe this last  one is better because it's more informative but how could i compare them? which branch is "better" if they are all slightly different? or all of them are quite similar regarding frequency?



  • Options
    wesselwessel Member Posts: 537 Maven
    Gain_ratio, information_gain, gini_index, and accuracy are measures to decide on which attribute to split, given the current dataset split.
    These are parameters of the decision tree learner.

    Accuracy, Precision, Recall are performance measure, on some test set. Rapid miner has these in the Performance (Classification) operator.

    There are several frameworks to compare models.
    The one best suited to trees, in my opinion, is the Minimal Description Length Framework.

  • Options
    rakirkrakirk Member Posts: 29 Contributor II
    Hi Karen,

    I agree with everything wessel said. I wanted to add that a typical trade-off analysis is done with learners in general (and decision trees are no exception) that compares model accuracy within a data set to model accuracy at classifying new data. A more generalizable model would be more favorable for predictive analysis. A more accurate, specialized model would be good for understanding a particular data set. Limiting the tree-depth is (in my opinion) probably the fastest way to explore these trade-offs.

Sign In or Register to comment.