Apply same process to different example sets

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help


I want to apply the same process to multiple different example sets, e.g. one exampleSet after another in kind of a batch or loop process over all different example Sets, is that anyhow possible?


and in the same way, I want to collect the results separately for each different example Set, in such a way, that one result from one process run is not being overwritten from the second run (from second example set) etc.... 

do I have to do it with the Collection operator only? so that I have a result for each exampleSet in a collection maybe, and if so how do I collect the results so?


  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Hi @Fred12

    You can use collect and then 'Loop Collection'?You can refer to the example process:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf" width="90" x="179" y="30">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf-Testset" width="90" x="179" y="120">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Iris" width="90" x="179" y="238">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    <operator activated="true" class="collect" compatibility="7.3.000" expanded="true" height="124" name="Collect" width="90" x="447" y="34"/>
    <operator activated="true" class="loop_collection" compatibility="7.3.000" expanded="true" height="82" name="Loop Collection" width="90" x="648" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <process expanded="true">
    <operator activated="true" class="split_validation" compatibility="7.3.000" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="7.3.000" expanded="true" height="76" name="k-NN" width="90" x="112" y="30"/>
    <connect from_port="training" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
    <list key="application_parameters"/>
    <operator activated="true" class="performance" compatibility="7.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    <portSpacing port="sink_averagable 3" spacing="0"/>
    <operator activated="true" class="store" compatibility="7.3.000" expanded="true" height="68" name="Store" width="90" x="514" y="34">
    <parameter key="repository_entry" value="//Template/Model/Model %{iteration}"/>
    <connect from_port="single" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_op="Store" to_port="input"/>
    <connect from_op="Validation" from_port="averagable 1" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <connect from_op="Golf" from_port="output" to_op="Collect" to_port="input 1"/>
    <connect from_op="Golf-Testset" from_port="output" to_op="Collect" to_port="input 2"/>
    <connect from_op="Iris" from_port="output" to_op="Collect" to_port="input 3"/>
    <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    ok that kind of works, but I have the problem that the resullts are appended on the same log for another exampleset under the first exampleset, therefore, it is hard to differentiate the outputs of the different examplesets as they are being appended together and I want separate results outputs for each exampleset... is that somehow possible ?


    otherwise, I would have to adress each results by some index and separate them by hand..

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You'd have to create and use a file_name macro to append the results for each dataset. 

Sign In or Register to comment.