Decision tree

sshildermansshilderman Member Posts: 9 Contributor II
edited November 2018 in Help

I'm trying to use a decision tree to predict user will leave.

My data include 4 regular attributes (2 nominal, 2 integer), and 1 special attribute (nominal label).

When using the Decision Tree operator I don't get a tree with all data, only one of the regular appear (as root) and the leafs contains the label data (which is OK).


What am I doing wrong?



  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    Hello, this may be simply happening because the data does not have patterns that fit the criteria you set.


    I will suggest trying values for pruning, prepruning and confidence values.


    A better way to find a right value for these would be using the "Optimize Parameters (Grid) operator and giving it a range to try combinations of some of these variables that affect your model.


    You should be able to see a sample process in the help for "Optimize Parameters(Grid)" to see how this operator works


    Good Luck

  • Options
    sshildermansshilderman Member Posts: 9 Contributor II

    Followup question -


    First of all, thank you for your answer.

    I created a table with patterns (manually), first to check i'm doing it right.


    Is there a way to know who is located in each leaf?

    I would like to learn which users will have a specific value (the labell value) in the future.



  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist



    what you can do is use the tree to rules operator. As a result (see attached process) you get the paths as strings. That might be helpful in first place. There is no one operator solution to apply this rules to a dataset to get "leaf IDs" but it might be possible to find some working process with things like Write as Text and then parse the resulting text files.




    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    <operator activated="true" class="tree_to_rules" compatibility="7.1.001" expanded="true" height="82" name="Tree to Rules" width="90" x="246" y="85">
    <process expanded="true">
    <operator activated="true" class="parallel_decision_tree" compatibility="7.1.001" expanded="true" height="82" name="Decision Tree" width="90" x="45" y="34"/>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="85">
    <list key="application_parameters"/>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Tree to Rules" to_port="training set"/>
    <connect from_op="Tree to Rules" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Tree to Rules" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.