Decision tree prediction's accuracy

amabdellatifamabdellatif Member Posts: 2 Contributor I
edited November 2019 in Help

Would you please advise with the following:

 

1- How to increase the accuracy of the decision tree block? 

2- Based on what shall I choose the decision tree parameter's value?

 

3- In case you the use of the "Optimizer" is recommended, is there any document that explains and define explicitly each parameter?

 

Thanks in advance and waiting for your response. 

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 564   Unicorn

    Hi @amabdellatif, actually the answer depends on what you are trying to predict. 

     

    For example

    • Information Gain will tend to favour distinct values in your data because it appears as highly predictive.  This means that you should use careful feature selection to prevent overfitting. 
    • Information Gain Ratio compensates for this by biasing against attributes that have a large number of distinct values, sometimes this means that it will favour attributes that are less predictive. 

    You should pick the setting that is most relevant to your data, (usually Gain Ratio, but not always)

    Can you tell us a little more about what you want to do?  Also explore your data to see how many of each class there are, another problem you may run into unexpectedly is if you have imbalanced data (one class is much higher than the other).  This means your decision tree focusing on accuracy might be 99% accurate by predicting everything all as a single class. 

     

    For the documentation have you tried the help files for the operator?  It's pretty useful as an explanation. 

     

    BalazsBaranybhupendra_patilabbasi_samira
  • amabdellatifamabdellatif Member Posts: 2 Contributor I

    Hello @JEdward

    thanks a lot for your reply.

     

    With regards to your questions about the data that I am trying to predict, is "the subscription of term deposits" for customers in a Bank (This is not a real data - trial version, not the real data)

    Attached is the whole data set if you can help me to figure out how should I start thinking.

    thanks in advance for your time and attention

    PS: The file attached is a .xlsx file, I just changed its extention to .dox to be able to upload it

Sign In or Register to comment.