Bug in RM-5 with operator execution order

Stefan_EStefan_E Member Posts: 53 Maven
edited November 2018 in Help
Hi all,

I have the below process. Now, obviously, in the current state, it doesn't make much sense, but it's the result of trying to isolate a very strange behavior. Let me explain:

The performance vector calculated changes completely when I remove the Model Applier with the name Apply Error. This is (at least to me) unexpected, since the only operators which use random numbers (Sample and MetaCost) have hard-wired local random seeds.

When I change the operator execution order so that Apply Error comes last, I get the same results as if I remove Apply Error ...

Obviously, the eventual goal is to route the output of Apply Error to the model coming out of MetaCost. The reason for all this is that a cross-validation gives me better results than if I test against entirely virgin data - my suspicion is that I have insufficient information in my attribute set - but now of course this strange behavior of RM-5 casts some doubt on the entire exercise.

I'm willing to share the underlying .csv with Rapid-I (of course assuming confidentiality) if this helps understanding the issue further.

Kind regards                                                                Stefan
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="746" width="975">
      <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="file_name" value="C:\Users\eichenbe\Documents\Backup\Laptop\LiveCopy\Software\RapidMiner_5\32401_P.csv"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set ID" width="90" x="179" y="30">
        <parameter key="name" value="DieID"/>
        <parameter key="target_role" value="id"/>
      </operator>
      <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="313" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="t_32401"/>
        <parameter key="max" value="74.1"/>
      </operator>
      <operator activated="true" class="remap_binominals" expanded="true" height="76" name="Remap Binominals" width="90" x="447" y="30">
        <parameter key="negative_value" value="false"/>
        <parameter key="positive_value" value="true"/>
      </operator>
      <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="581" y="30"/>
      <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Training (2)" width="90" x="45" y="300">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="Wafer&lt;=6"/>
      </operator>
      <operator activated="true" class="remove_attribute_range" expanded="true" height="76" name="Remove Wafer (2)" width="90" x="179" y="300">
        <parameter key="first_attribute" value="2"/>
        <parameter key="last_attribute" value="2"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Label (2)" width="90" x="313" y="300">
        <parameter key="name" value="t_32401"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Training" width="90" x="45" y="165">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="Wafer&lt;=6"/>
      </operator>
      <operator activated="true" class="remove_attribute_range" expanded="true" height="76" name="Remove Wafer" width="90" x="179" y="165">
        <parameter key="first_attribute" value="2"/>
        <parameter key="last_attribute" value="2"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Label" width="90" x="313" y="165">
        <parameter key="name" value="t_32401"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="447" y="165"/>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Error" width="90" x="447" y="300">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="sample" expanded="true" height="76" name="Sample" width="90" x="581" y="165">
        <parameter key="sample" value="relative"/>
        <parameter key="sample_ratio" value="0.3"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="metacost" expanded="true" height="76" name="MetaCost (2)" width="90" x="715" y="165">
        <parameter key="cost_matrix" value="[0.0 1.0;10.0 0.0]"/>
        <parameter key="use_subset_for_training" value="0.7"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="local_random_seed" value="100"/>
        <process expanded="true" height="773" width="912">
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM (2)" width="90" x="411" y="30">
            <parameter key="gamma" value="0.01"/>
            <parameter key="C" value="10000.0"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="training set" to_op="SVM (2)" to_port="training set"/>
          <connect from_op="SVM (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="715" y="300">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_binominal_classification" expanded="true" height="76" name="Performance" width="90" x="715" y="435">
        <parameter key="main_criterion" value="precision"/>
        <parameter key="AUC (optimistic)" value="true"/>
        <parameter key="precision" value="true"/>
        <parameter key="false_positive" value="true"/>
        <parameter key="false_negative" value="true"/>
        <parameter key="true_positive" value="true"/>
        <parameter key="true_negative" value="true"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set ID" to_port="example set input"/>
      <connect from_op="Set ID" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
      <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Remap Binominals" to_port="example set input"/>
      <connect from_op="Remap Binominals" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Filter Training" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Filter Training (2)" to_port="example set input"/>
      <connect from_op="Filter Training (2)" from_port="example set output" to_op="Remove Wafer (2)" to_port="example set input"/>
      <connect from_op="Remove Wafer (2)" from_port="example set output" to_op="Set Label (2)" to_port="example set input"/>
      <connect from_op="Set Label (2)" from_port="example set output" to_op="Apply Error" to_port="unlabelled data"/>
      <connect from_op="Filter Training" from_port="example set output" to_op="Remove Wafer" to_port="example set input"/>
      <connect from_op="Remove Wafer" from_port="example set output" to_op="Set Label" to_port="example set input"/>
      <connect from_op="Set Label" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Sample" to_port="example set input"/>
      <connect from_op="Normalize" from_port="preprocessing model" to_op="Apply Error" to_port="model"/>
      <connect from_op="Sample" from_port="example set output" to_op="MetaCost (2)" to_port="training set"/>
      <connect from_op="MetaCost (2)" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="MetaCost (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <connect from_op="Performance" from_port="performance" to_port="result 2"/>
      <connect from_op="Performance" from_port="example set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Stefan,
    did you check the execution order? The button in the top right corner of the process tab switches to execution order view. In this view the execution order might also be changed.
    If you check why this could occur, please keep in mind, that an example set is not copied in memory, if it passes a IOMultiplier. Instead there's only created a second view on the underlying table. If one branch of the process changes the data, it will be changed in the other view, too, if they share this  particular example and attribute.

    For reproducing you might either send me the data, or you could exchange your data by an example or data generator and adapt (and simplify) your process, so that the problem occurs, but is reduced to the minimal size. Will ease my understanding a lot :)

    Greetings,
      Sebastian
Sign In or Register to comment.