Generating a hierachy for decision tree models?

eldenosoeldenoso Member Posts: 65 Contributor I
edited December 2018 in Help

Hello altogether,

currently I am trying myself on visualizing big data with decision trees. Since I have a large dataset with a hierachy I am wondering if it is possible to apply this hierachy to the decision tree. For instance I have data for different years 2010-2015. For logical reasons it is disadvantageous to first have the year 2015 and then the year 2010 in the decision tree. But unfortunately that's exactly what the model does. Is there a way of letting the tree "know" that these attributes are chronologically?

Thank you 

Philipp :)

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,235   Unicorn
    I would loop over the values for the year attribute and generate a different tree for each one (thus removing the year attribute itself from appearing in the tree).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thank you Brian! :)

    but isn't an action in a year important for further actions in the years after? So by generating a tree for each year the influence is also divided and not visible anymore?

    Furthermore the tree I have is so large that I could print it out on a house wall :smileyvery-happy: is the only way of reducing the size by increasing the minimal gain of the tree?

    Regards
    Philipp

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,132  RM Data Scientist

    Moin Philipp,

     

    is your year Numerical or Nominal? I would assume it needs to be numerical to catch the hierachy?

     

    Best,

    MArtin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Hey Martin,

    thanks for your reply! :)

    The year is of the numerical type yyyy, so that should be a problem. What I have done before is to discretize and normalize all attributes but that IMO shouldn't be problem either?

    Regards,

    Philipp

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,132  RM Data Scientist

    Hey,

     

    Normalizing does not make a difference in a tree. I would not discretize the year.

     

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.