"Decision tree : keep id"

RichyRichy Member Posts: 20 Contributor II
edited May 2019 in Help
Hello,

I'm trying to use decision tree on RapidMiner and I can't find how to keep id during the process. Here is an example of what I get on rapidminer:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
   <process expanded="true" height="501" width="683">
     <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="//Samples/data/Golf"/>
     </operator>
     <operator activated="true" class="generate_id" compatibility="5.1.006" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
     <operator activated="true" class="generate_attributes" compatibility="5.1.006" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
       <list key="function_descriptions">
         <parameter key="indic" value="rand()"/>
       </list>
     </operator>
     <operator activated="true" class="numerical_to_polynominal" compatibility="5.1.006" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="447" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="indic"/>
     </operator>
     <operator activated="true" class="discretize_by_bins" compatibility="5.1.006" expanded="true" height="94" name="Discretize" width="90" x="45" y="210">
       <parameter key="attribute_filter_type" value="subset"/>
       <parameter key="attributes" value="|Humidity|Temperature"/>
       <parameter key="number_of_bins" value="3"/>
     </operator>
     <operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role" width="90" x="179" y="210">
       <parameter key="name" value="Play"/>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role (2)" width="90" x="313" y="210">
       <parameter key="name" value="indic"/>
       <parameter key="target_role" value="label"/>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="decision_tree" compatibility="5.1.006" expanded="true" height="76" name="Decision Tree" width="90" x="447" y="210"/>
     <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
     <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
     <connect from_op="Generate Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
     <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Discretize" to_port="example set input"/>
     <connect from_op="Discretize" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
     <connect from_op="Set Role (2)" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
     <connect from_op="Decision Tree" from_port="model" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="180"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Result :
 Wind = false
|   Temperature = range1 [-∞ - 71]: 0.406 {0.038=0, 0.209=0, 0.248=0, 0.406=1, 0.075=1, 0.575=0, 0.578=0, 0.118=0, 0.746=1, 0.397=0, 0.524=0, 0.639=0, 0.716=0, 0.297=0}
|   Temperature = range2 [71 - 78]: 0.118 {0.038=0, 0.209=0, 0.248=0, 0.406=0, 0.075=0, 0.575=0, 0.578=0, 0.118=1, 0.746=0, 0.397=1, 0.524=0, 0.639=0, 0.716=0, 0.297=0}
|   Temperature = range3 [78 - ∞]: 0.038 {0.038=1, 0.209=0, 0.248=1, 0.406=0, 0.075=0, 0.575=0, 0.578=0, 0.118=0, 0.746=0, 0.397=0, 0.524=0, 0.639=0, 0.716=1, 0.297=0}
Wind = true
|   Outlook = overcast: 0.578 {0.038=0, 0.209=0, 0.248=0, 0.406=0, 0.075=0, 0.575=0, 0.578=1, 0.118=0, 0.746=0, 0.397=0, 0.524=0, 0.639=1, 0.716=0, 0.297=0}
|   Outlook = rain: 0.575 {0.038=0, 0.209=0, 0.248=0, 0.406=0, 0.075=0, 0.575=1, 0.578=0, 0.118=0, 0.746=0, 0.397=0, 0.524=0, 0.639=0, 0.716=0, 0.297=1}
|   Outlook = sunny: 0.209 {0.038=0, 0.209=1, 0.248=0, 0.406=0, 0.075=0, 0.575=0, 0.578=0, 0.118=0, 0.746=0, 0.397=0, 0.524=1, 0.639=0, 0.716=0, 0.297=0}
In the results, for each node, there is the number of occurrences for each values, but I need to get the id of each one. I mean I need to get this kind of result:
 Wind = false
|   Temperature = range1 [-∞ - 71]: 0.406 {4,5,9}
|   Temperature = range2 [71 - 78]: 0.118 {8,10}
|   Temperature = range3 [78 - ∞]: 0.038 {1,3,13}
Wind = true
|   Outlook = overcast: 0.578 {7,12}
|   Outlook = rain: 0.575 {6,14}
|   Outlook = sunny: 0.209 {2,11}
Is there a simple way to do this kind of thing with decision tree?
Tagged:
Sign In or Register to comment.