"Decision tree shows only the label"

madsurfermadsurfer Member Posts: 2 Contributor I
edited June 2019 in Help

I have a data set of Sales from B2B that I am trying to dig in using a "Decision Tree". The attributes are: Country (Polynominal), State (Polynominal), DaysInSalePhase (integer), MonthlySales (integer), Deal (Binominal). I set the "Deal" attribute (which is a "Won" or "Lost" column) as the label. But the decision tree almost never show attribute such Country, State, but focus only on integer values. If i want to see something in the decision tree, I have to disable all the prunning options (which is not the best, isn't it ?), and most of the time the decision tree is only a box showing me the the number of "Won" and "Lost".

Any idea what I am doing wrong ? Does my data aren't good enough?



  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi David,

    you are on the correct path.

    The decision tree only performs a split, i.e. inserts a node, if it can find an attribute that contributes and improves the quality of the tree. If it does not split by Country or State, it means that these attributes do not have a strong correlation to the label.
    If there is only one node, then the tree did not find any useful attribute.

    You can try to reduce the "minimal gain" parameter of the tree to allow for splits on less significant attributes.

    You should probably also try another learning algorithm that may be better suited for your data.

    Did you validate your tree with a cross validation to see how well it performs?

    Best regards,
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I'd have a look at there two attributes to see if you can break these down further. 

    DaysInSalePhase (integer),
    MonthlySales (integer)  - particularly this one.

    Is it possible to return to the source data and calculate more attributes? 
    LastQuarterSales (integer)
    LastMonthSales (integer)
    MonthlySales (integer)

    Try to increase the amount of information available to the model. 
  • Options
    MBA_Data_MinerMBA_Data_Miner Member Posts: 21 Contributor II
    (building on what others have said) You could try an optimize parameters operator around a cross validation ( with the  decision tree inside). Optimize the information gain parameter for starters.

    You can add more parameters to optimize, but this will increase compute time exponentially. Experiment and record the settings and results until you get a tree you are satisfied with.

Sign In or Register to comment.