Get the paths that are defining my predictions in some decision trees

PB941PB941 Member Posts: 6 Contributor II
edited December 2018 in Help

Hello everyone:


I have been using this tool for a very short time and a "small" setback has arisen, or so I hope.


I'm running a model for which I'm using a meta-operator, specifically, metacost, with an assembled decision tree. The model returns me to different trees, so far so good. The problem arises now. I can not find how to get the paths that are defining the prediction in each of these trees separately.


In a previous post I could read this solution:



Accepted by topic author cypressproject0

06-09-2017 04:15 AM

Re: How to know ... in a tree model or tree to rules?

You can also use the "Operator Toolbox" Extension. This extension has an operator named "Get Decision Tree Path" that will produce a new variable with the path (a text variable) you have to follow to reach the final node for each observation. Be aware that this extension requires the Text Mining extension. So make sure you load it too.


Here's the process:



Screen Shot 2017-06-07 at 8.49.19 AM.png "


And here's the output:



Screen Shot 2017-06-07 at 8.50.45 AM.png "


And later and in this same post is the same problem that brings me today to you: "This solution is for a pure tree not for an operator assembled as a metacost."


Then offer this solution:



my problem MetaCost is also an ensemble model. Attached is a process with a small script converting the meta model into a collection of models and then transforming them into example sets. Note that it does not take the logic of MetaCost into account. "


And then the script.


Well: My question is: What do I do with that script? How can I achieve what I need? How can I know which path leads me to each prediction in each of the trees? Anyway. Any help will be a great help.


Thank you



  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @PB941 - did you actually run the process that @mschmitz posted in that thread?  It works very well.  He inserted his groovy code inside the Execute Script operator, etc...


    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Sonar" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
    <operator activated="true" class="split_validation" compatibility="7.6.001" expanded="true" height="124" name="Validation" width="90" x="246" y="136">
    <process expanded="true">
    <operator activated="true" class="metacost" compatibility="7.6.001" expanded="true" height="82" name="MetaCost" width="90" x="112" y="30">
    <parameter key="cost_matrix" value="[0.0 2.0;3.0 0.0]"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.6.001" expanded="true" height="76" name="Decision Tree (2)" width="90" x="313" y="30"/>
    <connect from_port="training set" to_op="Decision Tree (2)" to_port="training set"/>
    <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <connect from_port="training" to_op="MetaCost" to_port="training set"/>
    <connect from_op="MetaCost" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="30">
    <list key="application_parameters"/>
    <operator activated="true" class="performance_classification" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="179" y="30">
    <list key="class_weights"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    <operator activated="true" class="execute_script" compatibility="7.6.001" expanded="true" height="82" name="Execute Script" width="90" x="447" y="34">
    <parameter key="script" value="&#10;&#10;import com.rapidminer.operator.learner.meta.MetaCostModel;&#10;import com.rapidminer.operator.IOObjectCollection;&#10;import com.rapidminer.operator.learner.PredictionModel;&#10;&#10;MetaCostModel meta = input[0]&#10;IOObjectCollection&lt;PredictionModel&gt; io = new IOObjectCollection()&#10;&#10;for(int i = 0; i&lt; meta.getNumberOfModels();++i){&#10;&#9;io.add(meta.getModel(i))&#10;}&#10;return io&#10;&#10;&#10;"/>
    <operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="true" class="converters:dectree_2_example_set" compatibility="0.3.001" expanded="true" height="82" name="Decision Tree to ExampleSet" width="90" x="380" y="85"/>
    <connect from_port="single" to_op="Decision Tree to ExampleSet" to_port="tree"/>
    <connect from_op="Decision Tree to ExampleSet" from_port="exa" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <connect from_op="Sonar" from_port="output" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_op="Execute Script" to_port="input 1"/>
    <connect from_op="Execute Script" from_port="output 1" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="168"/>
    <portSpacing port="sink_result 2" spacing="0"/>




  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist



    the point is, that MetaCost is an ensemble operator. For one example you will get one tree branch for every tree. This is then combined to a prediction. So the final model is not one branch but a linear combination of many. For details, i would refer to the paper or the apply() code of the MetaCostModel.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    PB941PB941 Member Posts: 6 Contributor II
  • Options
    PB941PB941 Member Posts: 6 Contributor II

    Hi sgenzer.

    Thank you very much for your help. I inserted the script and this was the reusltado.

  • Options
    PB941PB941 Member Posts: 6 Contributor II

    Hello Martin. Thanks for your help. Yes, it is, but I am trying to know what each of those branches is. Is this possible?

Sign In or Register to comment.