How can I influence the order, in which the LOOP COLLECTION Operator works

uenge-sanuenge-san Member Posts: 12 Contributor II
edited November 2018 in Help

Hi there,

I have a process, where I get some data from a database for the last 7 days for up to 20 different machines.

In a second step, I want to do some aggregations and reports for each of the included equipments. This works pretty straight forward with the LOOP INTO COLLECTION and LOOP COLLECTIONS Operator. But the problem is, that the LOOP COLLECTIONS Operator does this in a (random) order, e.g. starts with machine1, than machine3, equipment 5, machine10, machine6, ...

 

But I want to influence the order, e.g. by increasing names of the machines, to have it in defined order for each run.

 

Any suggestions? Tried of course , e.g. sorting before grouping, but with no success

 

Thanks in advance!

 

Best Answer

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Solution Accepted

    Hi @uenge-san,

     

    Unfortunately Martin is right. At the moment the Group Into Collection operator generates a collection with a not defined (and therefor random) order of its entries.

    For the same dataset (when you for example rerun a process) it should be the same (random) order, but there is no chance to have a specific order in the Group Into Collection Operator at the moment.

    The Loop Collection just loops over this random order.

     

    I hope we can include an additional option to the Group Into Collection Operator to put out an ordered collection, in the next release of the Operator Toolbox extension.

    Until then I can only think about a workaround to reorder the collection by yourself after the Group Into Collection.

    For example with this process, where basically two Loop are used to reorder the Collection. If you have a really large number of machines this can take of course a while.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="0.8.000" expanded="true" height="82" name="Group Into Collection" width="90" x="380" y="34">
    <parameter key="group_by_attribute" value="Outlook"/>
    <description align="center" color="yellow" colored="true" width="126">Group the golf data according to the attribute Outlook</description>
    </operator>
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.8.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="380" y="289">
    <parameter key="Input Csv" value="Outlook&#10;rain&#10;sunny&#10;overcast"/>
    <description align="center" color="transparent" colored="false" width="126">Create an ExampleSet with the three values of the Outlook attribute</description>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="34"/>
    <operator activated="true" class="concurrency:loop" compatibility="8.0.001" expanded="true" height="103" name="Loop" width="90" x="648" y="136">
    <parameter key="number_of_iterations" value="3"/>
    <process expanded="true">
    <operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro" width="90" x="112" y="85">
    <parameter key="macro" value="outlook_value"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="Outlook"/>
    <parameter key="example_index" value="%{iteration}"/>
    <list key="additional_macros"/>
    <description align="center" color="transparent" colored="false" width="126">Extract the name of the current value of the Outlook Attribute</description>
    </operator>
    <operator activated="true" class="loop_collection" compatibility="8.0.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro (2)" width="90" x="45" y="34">
    <parameter key="macro" value="current_value_from_collection"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="Outlook"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    <description align="center" color="transparent" colored="false" width="126">Extract the value for the Outlook attribute from the current ExampleSet of the collection</description>
    </operator>
    <operator activated="true" class="branch" compatibility="8.0.001" expanded="true" height="103" name="Branch" width="90" x="179" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{outlook_value} == %{current_value_from_collection}"/>
    <process expanded="true">
    <connect from_port="input 1" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Only if both macros are the same, deliver the ExampleSet, otherwise do nothing</description>
    </operator>
    <connect from_port="single" to_op="Extract Macro (2)" to_port="example set"/>
    <connect from_op="Extract Macro (2)" from_port="example set" to_op="Branch" to_port="input 1"/>
    <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Loop over the collection and search for the corresponding ExampleSet</description>
    </operator>
    <connect from_port="input 1" to_op="Loop Collection" to_port="collection"/>
    <connect from_port="input 2" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Loop Collection" from_port="output 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="source_input 3" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Loop over the three different values from Create ExampleSet and search for the corresponding ExampleSet in the Collection</description>
    </operator>
    <operator activated="true" class="collect" compatibility="8.0.001" expanded="true" height="82" name="Collect" width="90" x="782" y="136">
    <parameter key="unfold" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Flatten resulting collection</description>
    </operator>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Group Into Collection" to_port="exa"/>
    <connect from_op="Group Into Collection" from_port="col" to_op="Multiply" to_port="input"/>
    <connect from_op="Create ExampleSet" from_port="output" to_op="Loop" to_port="input 2"/>
    <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_op="Collect" to_port="input 1"/>
    <connect from_op="Collect" from_port="collection" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Hopefully this helps a bit,

    Best regards,
    Fabian

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ok this is a question for the collection maestro @mschmitz :)

     

     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    i think currently this is not possible. Maybe @tftemme knows a way? He wrote the operator.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Here's a quick example on how to do it with the Loop operator. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Generate Collection" width="90" x="112" y="85">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Deals-Testset" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Deals-Testset"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Golf-Testset" width="90" x="45" y="340">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="391">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="collect" compatibility="7.6.001" expanded="true" height="166" name="Collect" width="90" x="179" y="136"/>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Collect" to_port="input 1"/>
    <connect from_op="Retrieve Deals-Testset" from_port="output" to_op="Collect" to_port="input 2"/>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Collect" to_port="input 3"/>
    <connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Collect" to_port="input 4"/>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Collect" to_port="input 5"/>
    <connect from_op="Collect" from_port="collection" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">You need to know the number of objects in your collection. There's various ways to do this so will leave as an exercise for the reader.</description>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="7.6.001" expanded="true" height="82" name="Loop" width="90" x="380" y="85">
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="select" compatibility="7.6.001" expanded="true" height="68" name="Select" width="90" x="112" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
    <list key="function_descriptions">
    <parameter key="Collection" value="%{iteration}"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Do stuff</description>
    </operator>
    <connect from_port="input 1" to_op="Select" to_port="collection"/>
    <connect from_op="Select" from_port="selected" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Turn off parallel execution for this so it runs in order.</description>
    </operator>
    <connect from_op="Generate Collection" from_port="out 1" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    If the problem is that you can't guarantee that your machines are in the collection in a certain order you could also use a Loop, but inside the loop perform a Loop Collections to filter out only the collections with your desired machine number. 

  • uenge-sanuenge-san Member Posts: 12 Contributor II

    Hi Fabian, @tftemme

     

    thanks for your answer and the example process.

     

    I also used a double looping workaround which of course lacks the simplicity of the LOOP COLLECTION Operator...

     

    So looking forward for a new release of the Operator Toolbox Extension ;)

     

    BR
    Martin

     

     

Sign In or Register to comment.