Options

Decision Tree #entropy #criterion #kappa #accuracy

CelineSCelineS Member Posts: 6 Newbie
edited August 2020 in Help
Hi guys,

   Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf? 

  Is the 70% accuracy and kappa 0.30ish enough for prediction?

   What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa? 


regards,






Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Hi there, you have a few questions embedded in your post, so I'll try to comment on most of them.
    The blue/red labels under each node indicate the number of examples that fell into each category in that node.  The ratio of these forms the basis of the confidence score generated by the DT.
    If you want to maximize your tree for accuracy, you can select accuracy directly as the main criterion for tree growth. But it is not possible to say in the abstract whether accuracy of 70% is "good enough" for prediction.  In some fields that would be considered great and used with no problem, while in other fields it would be horrible.  This question is very domain and dataset specific.
    Information gain tends to favor  attributes with more categories/specific values, because it is not adjusted for the number of possible distinct values.  Information gain ratio adjusts for this, so all else being equal, information gain ratio is probably the more robust criteria between the two (which is why it is the default).  If you want to understand how to calculate information gain, the wikipedia article has a good summary:  https://en.wikipedia.org/wiki/Information_gain_in_decision_trees

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.