RapidMiner

RapidMiner

Handling collections: behavior of union vs append

SOLVED
Regular Contributor

Handling collections: behavior of union vs append

[ Edited ]

Hi,

 

Seems things have been picking up quite a bit lately. Especially like the new search features in 7.2.

 

Right now, "append" seems to be the only operator that will directly merge the examplesets within a given collection.

 

However, "union" might sometimes be a preferred option. Unfortunately, however, it does not behave the same as append with respect to collections. This seems like a minor oversight, and I hope its addressed soon.

 

In the meantime, I have hacked together this building block that emulates this functionality.

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.2.000" expanded="true" height="82" name="Union Append" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="loop_collection" compatibility="7.2.000" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
<parameter key="set_iteration_macro" value="true"/>
<process expanded="true">
<operator activated="false" breakpoints="after" class="select" compatibility="7.2.000" expanded="true" height="68" name="Select (5)" width="90" x="112" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<operator activated="true" class="branch" compatibility="7.2.000" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">
<parameter key="condition_type" value="expression"/>
<parameter key="expression" value="%{iteration}==1"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="recall" compatibility="7.2.000" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
<parameter key="name" value="LoopData"/>
</operator>
<operator activated="true" class="union" compatibility="7.2.000" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
<connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
<connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
<connect from_op="Union (2)" from_port="union" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="7.2.000" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">
<parameter key="name" value="LoopData"/>
</operator>
<connect from_port="single" to_op="Branch (2)" to_port="condition"/>
<connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
<connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select" compatibility="7.2.000" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
<connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
<connect from_op="Select (6)" from_port="selected" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>

Hope someone else finds this useful too.

 

Cheers,

 

5 REPLIES
RMStaff

Re: Handling collections: behavior of union vs append

Indeed - busy times here and in general among the RM users Smiley Very Happy

 

And thanks for your kind words.  It is funny but the new search - despite being not a very exciting feature on its own - is also one of my favorites in 7.2...

 

I will forward this message to the engineers to look into Union.  But for now many thanks for the building block and happy mining,

Ingo


How to load processes in XML from the forum into RapidMiner: Read this!
Elite II

Re: Handling collections: behavior of union vs append

just want to second the notion for the ability of the "Union" operator to function like the "Append" operator when following a collection.  I run into this issue all the time.

 

Thanks!


Scott

Scott Genzer
Certified RapidMiner Analyst
Genzer Consulting

Re: Handling collections: behavior of union vs append

Hello @sgenzer

 

thank you for such a clean solution, Would you be ok if we publish this in our building block sections

http://community.rapidminer.com/t5/Building-Blocks/bd-p/BB

 

I'll provide link and credit to you obviously, thank you!!

Highlighted
Elite II

Re: Handling collections: behavior of union vs append

I agree it's a nice clean solution but it's not mine.  Smiley Happy  It's aruberutou - see above.

 

Scott

Scott Genzer
Certified RapidMiner Analyst
Genzer Consulting
Elite III

Re: Handling collections: behavior of union vs append

[ Edited ]

Very clean solution and a very good building block to add. 

 

I know from doing similar processes in the past that on large datasets Union can run pretty slowly at times especially on large datasets and collections.  

Use case, large text processing jobs. 

Hopefully this can be resolved with RapidMiner's improved memory management in the future, however until then you might find the following tips handy. 
 

If you do run into speed issues here is what I do to speed things up. 

  • Append is a tiny bit more efficient than Union so the goal is to first create a single empty dataset that has ALL attributes and then insert the data into that. 
  • Looping through the collection & using filter examples to create a blank set would be slow because Filter Examples is pretty inefficient for large operations.  JOIN is a better operator here and if you create an empty dataset with a single matching attribute in both datasets (e.g. '99this_att_doesnt_exist') you can quickly generate an empty set using your original Union Append method.  
  • Next loop the collection again (because you have looped once you can parallelize this using the standard loop operator until Loop Collection is parallel) and LEFT JOIN your empty dataset with each collection to add all the attributes in.
  • Finally Append your collections together (all the attributes now match).

It's a little complex when explained through bullet points so here's an example process.  The techniques should also be applicable (with a few edits) to Radoop as well so if you need to scale further please do. 

Note: if you happen have an attribute in your source collection named '99this_att_doesnt_exist' then you'll need to edit the subprocess 'Fast Performance Union Append' building block.  The att name was chosen as I thought it unlikely anyone would.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Example Collection Set" width="90" x="112" y="136">
        <process expanded="true">
          <operator activated="true" class="generate_massive_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Massive Data" width="90" x="112" y="136">
            <parameter key="number_attributes" value="60"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="7.4.000" expanded="true" height="82" name="Generate ID" width="90" x="246" y="136">
            <parameter key="create_nominal_ids" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:loop" compatibility="7.4.000" expanded="true" height="82" name="Loop (6)" width="90" x="380" y="136">
            <parameter key="number_of_iterations" value="20"/>
            <process expanded="true">
              <operator activated="true" class="select_by_random" compatibility="7.4.000" expanded="true" height="82" name="Select by Random" width="90" x="313" y="34">
                <parameter key="use_fixed_number_of_attributes" value="true"/>
                <parameter key="number_of_attributes" value="5"/>
                <description align="center" color="transparent" colored="false" width="126">5 attributes in each collection.</description>
              </operator>
              <connect from_port="input 1" to_op="Select by Random" to_port="example set input"/>
              <connect from_op="Select by Random" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">20 IOO objects in the collection.</description>
          </operator>
          <operator activated="true" class="collect" compatibility="7.4.000" expanded="true" height="82" name="Collect" width="90" x="648" y="34">
            <parameter key="unfold" value="true"/>
          </operator>
          <connect from_op="Generate Massive Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Loop (6)" to_port="input 1"/>
          <connect from_op="Loop (6)" from_port="output 1" to_op="Collect" to_port="input 1"/>
          <connect from_op="Collect" from_port="collection" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="136"/>
      <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Fast Performance Union Append" width="90" x="514" y="136">
        <process expanded="true">
          <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="45" y="187"/>
          <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Union Append" width="90" x="246" y="34">
            <process expanded="true">
              <operator activated="true" class="loop_collection" compatibility="7.4.000" expanded="true" height="82" name="Output (4)" width="90" x="112" y="34">
                <parameter key="set_iteration_macro" value="true"/>
                <parameter key="macro_name" value="collectionsNum"/>
                <process expanded="true">
                  <operator activated="true" class="generate_data_user_specification" compatibility="7.4.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="289">
                    <list key="attribute_values"/>
                    <list key="set_additional_roles"/>
                  </operator>
                  <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute" width="90" x="246" y="187">
                    <parameter key="name" value="99this_att_doesnt_exist"/>
                    <parameter key="value_type" value="integer"/>
                  </operator>
                  <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute (2)" width="90" x="179" y="34">
                    <parameter key="name" value="99this_att_doesnt_exist"/>
                    <parameter key="value_type" value="integer"/>
                  </operator>
                  <operator activated="true" class="join" compatibility="7.4.000" expanded="true" height="82" name="Join" width="90" x="447" y="85">
                    <parameter key="use_id_attribute_as_key" value="false"/>
                    <list key="key_attributes">
                      <parameter key="99this_att_doesnt_exist" value="99this_att_doesnt_exist"/>
                    </list>
                    <description align="center" color="transparent" colored="false" width="126">Use the pretend attribute to join the datasets together.</description>
                  </operator>
                  <operator activated="true" class="branch" compatibility="7.4.000" expanded="true" height="82" name="Branch (2)" width="90" x="648" y="34">
                    <parameter key="condition_type" value="expression"/>
                    <parameter key="expression" value="%{collectionsNum}==1"/>
                    <process expanded="true">
                      <connect from_port="condition" to_port="input 1"/>
                      <portSpacing port="source_condition" spacing="0"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="sink_input 1" spacing="0"/>
                      <portSpacing port="sink_input 2" spacing="0"/>
                    </process>
                    <process expanded="true">
                      <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
                        <parameter key="name" value="LoopData"/>
                      </operator>
                      <operator activated="true" class="union" compatibility="7.4.000" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
                      <connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
                      <connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
                      <connect from_op="Union (2)" from_port="union" to_port="input 1"/>
                      <portSpacing port="source_condition" spacing="0"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="sink_input 1" spacing="0"/>
                      <portSpacing port="sink_input 2" spacing="0"/>
                    </process>
                  </operator>
                  <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (5)" width="90" x="849" y="34">
                    <parameter key="name" value="LoopData"/>
                  </operator>
                  <connect from_port="single" to_op="Generate Empty Attribute (2)" to_port="example set input"/>
                  <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Empty Attribute" to_port="example set input"/>
                  <connect from_op="Generate Empty Attribute" from_port="example set output" to_op="Join" to_port="right"/>
                  <connect from_op="Generate Empty Attribute (2)" from_port="example set output" to_op="Join" to_port="left"/>
                  <connect from_op="Join" from_port="join" to_op="Branch (2)" to_port="condition"/>
                  <connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
                  <connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
                  <portSpacing port="source_single" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (6)" width="90" x="313" y="34">
                <parameter key="index" value="%{collectionsNum}"/>
              </operator>
              <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (3)" width="90" x="581" y="34">
                <parameter key="name" value="LoopDataAtts"/>
              </operator>
              <connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
              <connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
              <connect from_op="Select (6)" from_port="selected" to_op="Remember (3)" to_port="store"/>
              <connect from_op="Remember (3)" from_port="stored" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">First pass get a blank dataset with all collection attributes.</description>
          </operator>
          <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Union Append (2)" width="90" x="447" y="187">
            <process expanded="true">
              <operator activated="true" class="concurrency:loop" compatibility="7.4.000" expanded="true" height="82" name="Loop" width="90" x="112" y="34">
                <parameter key="number_of_iterations" value="%{collectionsNum}"/>
                <process expanded="true">
                  <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (4)" width="90" x="45" y="238">
                    <parameter key="name" value="LoopDataAtts"/>
                    <parameter key="remove_from_store" value="false"/>
                  </operator>
                  <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select" width="90" x="45" y="34">
                    <parameter key="index" value="%{iteration}"/>
                  </operator>
                  <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute (3)" width="90" x="179" y="34">
                    <parameter key="name" value="99this_att_doesnt_exist"/>
                    <parameter key="value_type" value="integer"/>
                  </operator>
                  <operator activated="true" class="join" compatibility="7.4.000" expanded="true" height="82" name="Join (3)" width="90" x="447" y="136">
                    <parameter key="join_type" value="left"/>
                    <parameter key="use_id_attribute_as_key" value="false"/>
                    <list key="key_attributes">
                      <parameter key="99this_att_doesnt_exist" value="99this_att_doesnt_exist"/>
                    </list>
                    <description align="center" color="transparent" colored="false" width="126">Note this is now a left join. We want to join the old atts with the new ones.</description>
                  </operator>
                  <connect from_port="input 1" to_op="Select" to_port="collection"/>
                  <connect from_op="Recall (4)" from_port="result" to_op="Join (3)" to_port="right"/>
                  <connect from_op="Select" from_port="selected" to_op="Generate Empty Attribute (3)" to_port="example set input"/>
                  <connect from_op="Generate Empty Attribute (3)" from_port="example set output" to_op="Join (3)" to_port="left"/>
                  <connect from_op="Join (3)" from_port="join" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <description align="center" color="transparent" colored="false" width="126">Because we have already looped through once we can now use a standard loop with parallel execution.</description>
              </operator>
              <operator activated="true" class="append" compatibility="7.4.000" expanded="true" height="82" name="Append" width="90" x="313" y="34"/>
              <connect from_port="in 2" to_op="Loop" to_port="input 1"/>
              <connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
              <connect from_op="Append" from_port="merged set" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Next join the collections to the empty dataset with all attributes and then append them together.</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="99this_att_doesnt_exist"/>
            <parameter key="invert_selection" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">Remove the 'join attribute' 99this_att_doesnt_exist</description>
          </operator>
          <connect from_port="in 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Union Append" to_port="in 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Union Append (2)" to_port="in 2"/>
          <connect from_op="Union Append" from_port="out 1" to_op="Union Append (2)" to_port="in 1"/>
          <connect from_op="Union Append (2)" from_port="out 1" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="order_attributes" compatibility="7.4.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="715" y="136">
        <parameter key="sort_mode" value="alphabetically"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Union Append (3)" width="90" x="514" y="340">
        <process expanded="true">
          <operator activated="true" class="loop_collection" compatibility="7.4.000" expanded="true" height="82" name="Output (3)" width="90" x="380" y="34">
            <parameter key="set_iteration_macro" value="true"/>
            <process expanded="true">
              <operator activated="false" breakpoints="after" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (2)" width="90" x="112" y="34">
                <parameter key="index" value="%{iteration}"/>
              </operator>
              <operator activated="true" class="branch" compatibility="7.4.000" expanded="true" height="82" name="Branch (3)" width="90" x="313" y="34">
                <parameter key="condition_type" value="expression"/>
                <parameter key="expression" value="%{iteration}==1"/>
                <process expanded="true">
                  <connect from_port="condition" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (2)" width="90" x="45" y="187">
                    <parameter key="name" value="LoopData"/>
                  </operator>
                  <operator activated="true" class="union" compatibility="7.4.000" expanded="true" height="82" name="Union (3)" width="90" x="179" y="34"/>
                  <connect from_port="condition" to_op="Union (3)" to_port="example set 1"/>
                  <connect from_op="Recall (2)" from_port="result" to_op="Union (3)" to_port="example set 2"/>
                  <connect from_op="Union (3)" from_port="union" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (2)" width="90" x="581" y="34">
                <parameter key="name" value="LoopData"/>
              </operator>
              <connect from_port="single" to_op="Branch (3)" to_port="condition"/>
              <connect from_op="Branch (3)" from_port="input 1" to_op="Remember (2)" to_port="store"/>
              <connect from_op="Remember (2)" from_port="stored" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (3)" width="90" x="581" y="34">
            <parameter key="index" value="%{iteration}"/>
          </operator>
          <connect from_port="in 1" to_op="Output (3)" to_port="collection"/>
          <connect from_op="Output (3)" from_port="output 1" to_op="Select (3)" to_port="collection"/>
          <connect from_op="Select (3)" from_port="selected" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log" width="90" x="715" y="340">
        <list key="log">
          <parameter key="Fast Append" value="operator.Fast Performance Union Append.value.execution-time"/>
          <parameter key="Union Append" value="operator.Union Append (3).value.execution-time"/>
        </list>
        <description align="center" color="transparent" colored="false" width="126">On small datasets the Union Append is much quicker. On larger datasets the Fast Performance Union is better.</description>
      </operator>
      <operator activated="true" class="order_attributes" compatibility="7.4.000" expanded="true" height="82" name="Reorder Attributes (2)" width="90" x="916" y="238">
        <parameter key="sort_mode" value="alphabetically"/>
      </operator>
      <connect from_op="Example Collection Set" from_port="out 1" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Fast Performance Union Append" to_port="in 1"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="Union Append (3)" to_port="in 1"/>
      <connect from_op="Fast Performance Union Append" from_port="out 1" to_op="Reorder Attributes" to_port="example set input"/>
      <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
      <connect from_op="Union Append (3)" from_port="out 1" to_op="Log" to_port="through 1"/>
      <connect from_op="Log" from_port="through 1" to_op="Reorder Attributes (2)" to_port="example set input"/>
      <connect from_op="Reorder Attributes (2)" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

 

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com