Generating a hierachy for decision tree models?

eldenosoeldenoso Member Posts: 65 Contributor I
edited December 2018 in Help

Hello altogether,

currently I am trying myself on visualizing big data with decision trees. Since I have a large dataset with a hierachy I am wondering if it is possible to apply this hierachy to the decision tree. For instance I have data for different years 2010-2015. For logical reasons it is disadvantageous to first have the year 2015 and then the year 2010 in the decision tree. But unfortunately that's exactly what the model does. Is there a way of letting the tree "know" that these attributes are chronologically?

Thank you 

Philipp :)

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I would loop over the values for the year attribute and generate a different tree for each one (thus removing the year attribute itself from appearing in the tree).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thank you Brian! :)

    but isn't an action in a year important for further actions in the years after? So by generating a tree for each year the influence is also divided and not visible anymore?

    Furthermore the tree I have is so large that I could print it out on a house wall :smileyvery-happy: is the only way of reducing the size by increasing the minimal gain of the tree?

    Regards
    Philipp

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Moin Philipp,

     

    is your year Numerical or Nominal? I would assume it needs to be numerical to catch the hierachy?

     

    Best,

    MArtin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Hey Martin,

    thanks for your reply! :)

    The year is of the numerical type yyyy, so that should be a problem. What I have done before is to discretize and normalize all attributes but that IMO shouldn't be problem either?

    Regards,

    Philipp

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Hey,

     

    Normalizing does not make a difference in a tree. I would not discretize the year.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.