X-Validation Bug

marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
edited November 2018 in Help
Hello

I have noticed a bug in X-Validatin operator. I guess you forgot to make a clone of the test set becouse whenever I want to access training set on the test side of X-Validation I have a view only on the test samples. This bug appears when I use through port and also when I use remember/recall. At the moment the problem can be solved by materializing training data before connecting to through port.

Bug example is provided below:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.007">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
    <process expanded="true" height="625" width="926">
      <operator activated="true" class="retrieve" compatibility="5.2.007" expanded="true" height="60" name="Retrieve" width="90" x="71" y="33">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.2.007" expanded="true" height="112" name="Validation" width="90" x="214" y="34">
        <process expanded="true" height="625" width="438">
          <operator activated="true" class="default_model" compatibility="5.2.007" expanded="true" height="76" name="Default Model" width="90" x="112" y="30"/>
          <connect from_port="training" to_op="Default Model" to_port="training set"/>
          <connect from_op="Default Model" from_port="model" to_port="model"/>
          <connect from_op="Default Model" from_port="exampleSet" to_port="through 1"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <portSpacing port="sink_through 2" spacing="0"/>
        </process>
        <process expanded="true" height="625" width="438">
          <operator activated="true" breakpoints="before" class="k_nn" compatibility="5.2.007" expanded="true" height="76" name="k-NN" width="90" x="45" y="120"/>
          <operator activated="true" class="apply_model" compatibility="5.2.007" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.007" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_port="through 1" to_op="k-NN" to_port="training set"/>
          <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="source_through 2" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I would also suggest that the model output of the training subprocess of X-Validation shouldn't be required to execute the main process. Now it is required to use some dummy operator to execute the process.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Marcin,

    thanks for your report. I created a bug report for this: http://bugs.rapid-i.com/show_bug.cgi?id=1206

    We will probably not change the behaviour of the model output though, since 99% of the users will use it in the "normal" way, and the warning/error will help a lot of new (and probably also experienced but forgetful) users.

    Best, Marius
  • marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
    Thank you for your response.

    I just want to mention that it also appear when using remember/recall operators. So I do Remember on the training side and recall on the test side.
    It is very confusing.
    Moreover  it doesn't appear in parallel X-validation and Bootstrapping Validation, but appear in Split Validation also.

    Best regards
    Marcin
Sign In or Register to comment.