unfold option in Loop Collection / Collect operator

sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited November 2018 in Help
hi...discovered the interesting "unfold" option today in my RapidMiner journey and was quite happy to find it.  If I understand it properly, it has the ability to union a collection of example sets with one operator (in the same way that the Append operator does if the number&type of attributes are identical)?  Two questions: 1) why on earth is it called "unfold"?  I've been searching for this ability for a long, long time (and been creating daisy chains of Union operators during this time).  Why not call it "union"? 2) how can I get it to work?  in the input, I put multiple example sets and then, if I check "unfold", I would expect one big union example set coming out.  But it does not work.  Help?

Here I generate three example sets, collect them, and then try to "unfold" them using Collect:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data" width="90" x="45" y="120">
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
       <parameter key="number_of_attributes" value="3"/>
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data (3)" width="90" x="45" y="300">
       <parameter key="number_of_attributes" value="8"/>
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="112" name="Collect" width="90" x="246" y="165"/>
     <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="76" name="Collect (2)" width="90" x="380" y="165">
       <parameter key="unfold" value="true"/>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Collect" to_port="input 1"/>
     <connect from_op="Generate Data (2)" from_port="output" to_op="Collect" to_port="input 2"/>
     <connect from_op="Generate Data (3)" from_port="output" to_op="Collect" to_port="input 3"/>
     <connect from_op="Collect" from_port="collection" to_op="Collect (2)" to_port="input 1"/>
     <connect from_op="Collect (2)" from_port="collection" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Here I do the same thing with Loop Collection:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data" width="90" x="45" y="120">
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
       <parameter key="number_of_attributes" value="3"/>
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Generate Data (3)" width="90" x="45" y="300">
       <parameter key="number_of_attributes" value="8"/>
       <parameter key="attributes_lower_bound" value="0.0"/>
     </operator>
     <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="112" name="Collect" width="90" x="246" y="165"/>
     <operator activated="true" class="loop_collection" compatibility="6.5.002" expanded="true" height="76" name="Loop Collection" width="90" x="380" y="165">
       <parameter key="unfold" value="true"/>
       <process expanded="true">
         <connect from_port="single" to_port="output 1"/>
         <portSpacing port="source_single" spacing="0"/>
         <portSpacing port="sink_output 1" spacing="0"/>
         <portSpacing port="sink_output 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Collect" to_port="input 1"/>
     <connect from_op="Generate Data (2)" from_port="output" to_op="Collect" to_port="input 2"/>
     <connect from_op="Generate Data (3)" from_port="output" to_op="Collect" to_port="input 3"/>
     <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
     <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
I must be missing something very simple?  Thanks.

Scott

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    This example below might help you.  My understanding is that it combines collections of collections together, not unions of example sets. 
    Might be a good feature request though for future versions as I often loop collections too to join example sets together & find that on larger collections it can be tricky with memory handling.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="6.5.002" expanded="true" height="76" name="Coll1" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Collection1" width="90" x="45" y="30"/>
              <operator activated="true" class="multiply" compatibility="6.5.002" expanded="true" height="166" name="Multiply" width="90" x="179" y="30"/>
              <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="166" name="Collect" width="90" x="313" y="30"/>
              <connect from_op="Collection1" from_port="output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Collect" to_port="input 1"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Collect" to_port="input 2"/>
              <connect from_op="Multiply" from_port="output 3" to_op="Collect" to_port="input 3"/>
              <connect from_op="Multiply" from_port="output 4" to_op="Collect" to_port="input 4"/>
              <connect from_op="Multiply" from_port="output 5" to_op="Collect" to_port="input 5"/>
              <connect from_op="Multiply" from_port="output 6" to_op="Collect" to_port="input 6"/>
              <connect from_op="Collect" from_port="collection" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="subprocess" compatibility="6.5.002" expanded="true" height="76" name="Coll2" width="90" x="45" y="165">
            <process expanded="true">
              <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Collection2" width="90" x="45" y="30"/>
              <operator activated="true" class="multiply" compatibility="6.5.002" expanded="true" height="166" name="Multiply (2)" width="90" x="179" y="30"/>
              <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="166" name="Collect (2)" width="90" x="313" y="30"/>
              <connect from_op="Collection2" from_port="output" to_op="Multiply (2)" to_port="input"/>
              <connect from_op="Multiply (2)" from_port="output 1" to_op="Collect (2)" to_port="input 1"/>
              <connect from_op="Multiply (2)" from_port="output 2" to_op="Collect (2)" to_port="input 2"/>
              <connect from_op="Multiply (2)" from_port="output 3" to_op="Collect (2)" to_port="input 3"/>
              <connect from_op="Multiply (2)" from_port="output 4" to_op="Collect (2)" to_port="input 4"/>
              <connect from_op="Multiply (2)" from_port="output 5" to_op="Collect (2)" to_port="input 5"/>
              <connect from_op="Multiply (2)" from_port="output 6" to_op="Collect (2)" to_port="input 6"/>
              <connect from_op="Collect (2)" from_port="collection" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="subprocess" compatibility="6.5.002" expanded="true" height="76" name="Coll3" width="90" x="45" y="300">
            <process expanded="true">
              <operator activated="true" class="generate_data" compatibility="6.5.002" expanded="true" height="60" name="Collection3" width="90" x="45" y="30"/>
              <operator activated="true" class="multiply" compatibility="6.5.002" expanded="true" height="166" name="Multiply (3)" width="90" x="179" y="30"/>
              <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="166" name="Collect (3)" width="90" x="313" y="30"/>
              <connect from_op="Collection3" from_port="output" to_op="Multiply (3)" to_port="input"/>
              <connect from_op="Multiply (3)" from_port="output 1" to_op="Collect (3)" to_port="input 1"/>
              <connect from_op="Multiply (3)" from_port="output 2" to_op="Collect (3)" to_port="input 2"/>
              <connect from_op="Multiply (3)" from_port="output 3" to_op="Collect (3)" to_port="input 3"/>
              <connect from_op="Multiply (3)" from_port="output 4" to_op="Collect (3)" to_port="input 4"/>
              <connect from_op="Multiply (3)" from_port="output 5" to_op="Collect (3)" to_port="input 5"/>
              <connect from_op="Multiply (3)" from_port="output 6" to_op="Collect (3)" to_port="input 6"/>
              <connect from_op="Collect (3)" from_port="collection" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" breakpoints="after" class="collect" compatibility="6.5.002" expanded="true" height="112" name="Collect (4)" width="90" x="246" y="120">
            <description align="center" color="transparent" colored="false" width="126">Create a collection of collections.</description>
          </operator>
          <operator activated="true" class="collect" compatibility="6.5.002" expanded="true" height="76" name="Collect (5)" width="90" x="380" y="120">
            <parameter key="unfold" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">'Unfold' the collection of collections.</description>
          </operator>
          <operator activated="false" class="flatten_collection" compatibility="6.5.002" expanded="true" height="60" name="Flatten Collection" width="90" x="380" y="255">
            <description align="center" color="transparent" colored="false" width="126">Flatten should have same effect.</description>
          </operator>
          <connect from_op="Coll1" from_port="out 1" to_op="Collect (4)" to_port="input 1"/>
          <connect from_op="Coll2" from_port="out 1" to_op="Collect (4)" to_port="input 2"/>
          <connect from_op="Coll3" from_port="out 1" to_op="Collect (4)" to_port="input 3"/>
          <connect from_op="Collect (4)" from_port="collection" to_op="Collect (5)" to_port="input 1"/>
          <connect from_op="Collect (5)" from_port="collection" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    thanks but I am still unclear - that's not what the documentation says (http://docs.rapidminer.com/studio/operators/process_control/collections/loop_collection.html):

    unfold
    This parameter specifies whether collections received at the input ports should be unfolded. If the unfold parameter is set to true then the output will be the union of all elements of the input collections.

    Scott
Sign In or Register to comment.