RapidMiner

Apply same process to different example sets

Guru
Guru

Apply same process to different example sets

hi,

I want to apply the same process to multiple different example sets, e.g. one exampleSet after another in kind of a batch or loop process over all different example Sets, is that anyhow possible?

 

and in the same way, I want to collect the results separately for each different example Set, in such a way, that one result from one process run is not being overwritten from the second run (from second example set) etc.... 

do I have to do it with the Collection operator only? so that I have a result for each exampleSet in a collection maybe, and if so how do I collect the results so?

3 REPLIES
Highlighted
RM Staff
RM Staff

Re: Apply same process to different example sets

Hi @Fred12

You can use collect and then 'Loop Collection'?You can refer to the example process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf" width="90" x="179" y="30">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf-Testset" width="90" x="179" y="120">
        <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Iris" width="90" x="179" y="238">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="collect" compatibility="7.3.000" expanded="true" height="124" name="Collect" width="90" x="447" y="34"/>
      <operator activated="true" class="loop_collection" compatibility="7.3.000" expanded="true" height="82" name="Loop Collection" width="90" x="648" y="34">
        <parameter key="set_iteration_macro" value="true"/>
        <process expanded="true">
          <operator activated="true" class="split_validation" compatibility="7.3.000" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
            <process expanded="true">
              <operator activated="true" class="k_nn" compatibility="7.3.000" expanded="true" height="76" name="k-NN" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="k-NN" to_port="training set"/>
              <connect from_op="k-NN" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="7.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
              <portSpacing port="sink_averagable 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="store" compatibility="7.3.000" expanded="true" height="68" name="Store" width="90" x="514" y="34">
            <parameter key="repository_entry" value="//Template/Model/Model %{iteration}"/>
          </operator>
          <connect from_port="single" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_op="Store" to_port="input"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Golf" from_port="output" to_op="Collect" to_port="input 1"/>
      <connect from_op="Golf-Testset" from_port="output" to_op="Collect" to_port="input 2"/>
      <connect from_op="Iris" from_port="output" to_op="Collect" to_port="input 3"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Guru
Guru

Re: Apply same process to different example sets

ok that kind of works, but I have the problem that the resullts are appended on the same log for another exampleset under the first exampleset, therefore, it is hard to differentiate the outputs of the different examplesets as they are being appended together and I want separate results outputs for each exampleset... is that somehow possible ?

 

otherwise, I would have to adress each results by some index and separate them by hand..

RM Certified Expert
RM Certified Expert

Re: Apply same process to different example sets

You'd have to create and use a file_name macro to append the results for each dataset.