Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

X-Validation Bug

marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
edited November 2018 in Help
Hello

I have noticed a bug in X-Validatin operator. I guess you forgot to make a clone of the test set becouse whenever I want to access training set on the test side of X-Validation I have a view only on the test samples. This bug appears when I use through port and also when I use remember/recall. At the moment the problem can be solved by materializing training data before connecting to through port.

Bug example is provided below:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.007">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
    <process expanded="true" height="625" width="926">
      <operator activated="true" class="retrieve" compatibility="5.2.007" expanded="true" height="60" name="Retrieve" width="90" x="71" y="33">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.2.007" expanded="true" height="112" name="Validation" width="90" x="214" y="34">
        <process expanded="true" height="625" width="438">
          <operator activated="true" class="default_model" compatibility="5.2.007" expanded="true" height="76" name="Default Model" width="90" x="112" y="30"/>
          <connect from_port="training" to_op="Default Model" to_port="training set"/>
          <connect from_op="Default Model" from_port="model" to_port="model"/>
          <connect from_op="Default Model" from_port="exampleSet" to_port="through 1"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <portSpacing port="sink_through 2" spacing="0"/>
        </process>
        <process expanded="true" height="625" width="438">
          <operator activated="true" breakpoints="before" class="k_nn" compatibility="5.2.007" expanded="true" height="76" name="k-NN" width="90" x="45" y="120"/>
          <operator activated="true" class="apply_model" compatibility="5.2.007" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.007" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_port="through 1" to_op="k-NN" to_port="training set"/>
          <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="source_through 2" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I would also suggest that the model output of the training subprocess of X-Validation shouldn't be required to execute the main process. Now it is required to use some dummy operator to execute the process.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Marcin,

    thanks for your report. I created a bug report for this: http://bugs.rapid-i.com/show_bug.cgi?id=1206

    We will probably not change the behaviour of the model output though, since 99% of the users will use it in the "normal" way, and the warning/error will help a lot of new (and probably also experienced but forgetful) users.

    Best, Marius
  • marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
    Thank you for your response.

    I just want to mention that it also appear when using remember/recall operators. So I do Remember on the training side and recall on the test side.
    It is very confusing.
    Moreover  it doesn't appear in parallel X-validation and Bootstrapping Validation, but appear in Split Validation also.

    Best regards
    Marcin
Sign In or Register to comment.