Options

Attribute weight help

PPPP Member Posts: 9 Contributor II
edited November 2018 in Help
Hi,
I'm working with decision trees, I’m trying to understand the factors must important that conducts to a sewer pipe failure, my records have attributes like diameter , length, etc. I have an attribute whit the number of failures in that pipe, I think this information can be use like a weight. Because the data is unbalanced most of the examples have a value 0. If I use “set rule” to set this attribute as weight, the tree becomes completely trained and useless, so I have to get out this attribute to get results. So my question is, is there a way to use this information without over training the tree.

Thanks for your attention,
Paulo Praça
Tagged:

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Have you considered to put the role of that attribute to weight?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    PPPP Member Posts: 9 Contributor II
    yes I put, but the tree get overtraining, when I did that the tree only have two leaves. The failure range frequency varies between 1 and 6, maximum i have 6 failures in one pipe but for almost every one I have only one failure. IF I put this attribute as ‘weight’ he shadows the others attributes.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    I think we have a different understanding of overtraining. I guess your tree simply gets worse by this.

    Have you tried to change the minimal gain?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    What if you discretized that field and used it as the label? 

    For example:
    Faults
    0
    1
    2-3
    4-5
    6+

    Or even more simply Faults: Low, Medium, High. 
Sign In or Register to comment.