Handling collections: behavior of union vs append

aruberutouaruberutou Member Posts: 23 Contributor II
edited November 2018 in Help

Hi,

 

Seems things have been picking up quite a bit lately. Especially like the new search features in 7.2.

 

Right now, "append" seems to be the only operator that will directly merge the examplesets within a given collection.

 

However, "union" might sometimes be a preferred option. Unfortunately, however, it does not behave the same as append with respect to collections. This seems like a minor oversight, and I hope its addressed soon.

 

In the meantime, I have hacked together this building block that emulates this functionality.

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.2.000" expanded="true" height="82" name="Union Append" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="loop_collection" compatibility="7.2.000" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
<parameter key="set_iteration_macro" value="true"/>
<process expanded="true">
<operator activated="false" breakpoints="after" class="select" compatibility="7.2.000" expanded="true" height="68" name="Select (5)" width="90" x="112" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<operator activated="true" class="branch" compatibility="7.2.000" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">
<parameter key="condition_type" value="expression"/>
<parameter key="expression" value="%{iteration}==1"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="recall" compatibility="7.2.000" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
<parameter key="name" value="LoopData"/>
</operator>
<operator activated="true" class="union" compatibility="7.2.000" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
<connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
<connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
<connect from_op="Union (2)" from_port="union" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="7.2.000" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">
<parameter key="name" value="LoopData"/>
</operator>
<connect from_port="single" to_op="Branch (2)" to_port="condition"/>
<connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
<connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select" compatibility="7.2.000" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
<connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
<connect from_op="Select (6)" from_port="selected" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>

Hope someone else finds this useful too.

 

Cheers,

 

Best Answer

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Solution Accepted

    Very clean solution and a very good building block to add. 

     

    I know from doing similar processes in the past that on large datasets Union can run pretty slowly at times especially on large datasets and collections.  

    Use case, large text processing jobs. 

    Hopefully this can be resolved with RapidMiner's improved memory management in the future, however until then you might find the following tips handy. 
     

    If you do run into speed issues here is what I do to speed things up. 

    • Append is a tiny bit more efficient than Union so the goal is to first create a single empty dataset that has ALL attributes and then insert the data into that. 
    • Looping through the collection & using filter examples to create a blank set would be slow because Filter Examples is pretty inefficient for large operations.  JOIN is a better operator here and if you create an empty dataset with a single matching attribute in both datasets (e.g. '99this_att_doesnt_exist') you can quickly generate an empty set using your original Union Append method.  
    • Next loop the collection again (because you have looped once you can parallelize this using the standard loop operator until Loop Collection is parallel) and LEFT JOIN your empty dataset with each collection to add all the attributes in.
    • Finally Append your collections together (all the attributes now match).

    It's a little complex when explained through bullet points so here's an example process.  The techniques should also be applicable (with a few edits) to Radoop as well so if you need to scale further please do. 

    Note: if you happen have an attribute in your source collection named '99this_att_doesnt_exist' then you'll need to edit the subprocess 'Fast Performance Union Append' building block.  The att name was chosen as I thought it unlikely anyone would.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Example Collection Set" width="90" x="112" y="136">
    <process expanded="true">
    <operator activated="true" class="generate_massive_data" compatibility="7.4.000" expanded="true" height="68" name="Generate Massive Data" width="90" x="112" y="136">
    <parameter key="number_attributes" value="60"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="7.4.000" expanded="true" height="82" name="Generate ID" width="90" x="246" y="136">
    <parameter key="create_nominal_ids" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="7.4.000" expanded="true" height="82" name="Loop (6)" width="90" x="380" y="136">
    <parameter key="number_of_iterations" value="20"/>
    <process expanded="true">
    <operator activated="true" class="select_by_random" compatibility="7.4.000" expanded="true" height="82" name="Select by Random" width="90" x="313" y="34">
    <parameter key="use_fixed_number_of_attributes" value="true"/>
    <parameter key="number_of_attributes" value="5"/>
    <description align="center" color="transparent" colored="false" width="126">5 attributes in each collection.</description>
    </operator>
    <connect from_port="input 1" to_op="Select by Random" to_port="example set input"/>
    <connect from_op="Select by Random" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">20 IOO objects in the collection.</description>
    </operator>
    <operator activated="true" class="collect" compatibility="7.4.000" expanded="true" height="82" name="Collect" width="90" x="648" y="34">
    <parameter key="unfold" value="true"/>
    </operator>
    <connect from_op="Generate Massive Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Loop (6)" to_port="input 1"/>
    <connect from_op="Loop (6)" from_port="output 1" to_op="Collect" to_port="input 1"/>
    <connect from_op="Collect" from_port="collection" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="136"/>
    <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Fast Performance Union Append" width="90" x="514" y="136">
    <process expanded="true">
    <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="45" y="187"/>
    <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Union Append" width="90" x="246" y="34">
    <process expanded="true">
    <operator activated="true" class="loop_collection" compatibility="7.4.000" expanded="true" height="82" name="Output (4)" width="90" x="112" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <parameter key="macro_name" value="collectionsNum"/>
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.4.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="289">
    <list key="attribute_values"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute" width="90" x="246" y="187">
    <parameter key="name" value="99this_att_doesnt_exist"/>
    <parameter key="value_type" value="integer"/>
    </operator>
    <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute (2)" width="90" x="179" y="34">
    <parameter key="name" value="99this_att_doesnt_exist"/>
    <parameter key="value_type" value="integer"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.4.000" expanded="true" height="82" name="Join" width="90" x="447" y="85">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="99this_att_doesnt_exist" value="99this_att_doesnt_exist"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Use the pretend attribute to join the datasets together.</description>
    </operator>
    <operator activated="true" class="branch" compatibility="7.4.000" expanded="true" height="82" name="Branch (2)" width="90" x="648" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{collectionsNum}==1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
    <parameter key="name" value="LoopData"/>
    </operator>
    <operator activated="true" class="union" compatibility="7.4.000" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
    <connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
    <connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
    <connect from_op="Union (2)" from_port="union" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (5)" width="90" x="849" y="34">
    <parameter key="name" value="LoopData"/>
    </operator>
    <connect from_port="single" to_op="Generate Empty Attribute (2)" to_port="example set input"/>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Empty Attribute" to_port="example set input"/>
    <connect from_op="Generate Empty Attribute" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Generate Empty Attribute (2)" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Branch (2)" to_port="condition"/>
    <connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
    <connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (6)" width="90" x="313" y="34">
    <parameter key="index" value="%{collectionsNum}"/>
    </operator>
    <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (3)" width="90" x="581" y="34">
    <parameter key="name" value="LoopDataAtts"/>
    </operator>
    <connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
    <connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
    <connect from_op="Select (6)" from_port="selected" to_op="Remember (3)" to_port="store"/>
    <connect from_op="Remember (3)" from_port="stored" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">First pass get a blank dataset with all collection attributes.</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Union Append (2)" width="90" x="447" y="187">
    <process expanded="true">
    <operator activated="true" class="concurrency:loop" compatibility="7.4.000" expanded="true" height="82" name="Loop" width="90" x="112" y="34">
    <parameter key="number_of_iterations" value="%{collectionsNum}"/>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (4)" width="90" x="45" y="238">
    <parameter key="name" value="LoopDataAtts"/>
    <parameter key="remove_from_store" value="false"/>
    </operator>
    <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select" width="90" x="45" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <operator activated="true" class="generate_empty_attribute" compatibility="7.4.000" expanded="true" height="82" name="Generate Empty Attribute (3)" width="90" x="179" y="34">
    <parameter key="name" value="99this_att_doesnt_exist"/>
    <parameter key="value_type" value="integer"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.4.000" expanded="true" height="82" name="Join (3)" width="90" x="447" y="136">
    <parameter key="join_type" value="left"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="99this_att_doesnt_exist" value="99this_att_doesnt_exist"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Note this is now a left join. We want to join the old atts with the new ones.</description>
    </operator>
    <connect from_port="input 1" to_op="Select" to_port="collection"/>
    <connect from_op="Recall (4)" from_port="result" to_op="Join (3)" to_port="right"/>
    <connect from_op="Select" from_port="selected" to_op="Generate Empty Attribute (3)" to_port="example set input"/>
    <connect from_op="Generate Empty Attribute (3)" from_port="example set output" to_op="Join (3)" to_port="left"/>
    <connect from_op="Join (3)" from_port="join" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Because we have already looped through once we can now use a standard loop with parallel execution.</description>
    </operator>
    <operator activated="true" class="append" compatibility="7.4.000" expanded="true" height="82" name="Append" width="90" x="313" y="34"/>
    <connect from_port="in 2" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Next join the collections to the empty dataset with all attributes and then append them together.</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="99this_att_doesnt_exist"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Remove the 'join attribute' 99this_att_doesnt_exist</description>
    </operator>
    <connect from_port="in 1" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Union Append" to_port="in 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Union Append (2)" to_port="in 2"/>
    <connect from_op="Union Append" from_port="out 1" to_op="Union Append (2)" to_port="in 1"/>
    <connect from_op="Union Append (2)" from_port="out 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="7.4.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="715" y="136">
    <parameter key="sort_mode" value="alphabetically"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="82" name="Union Append (3)" width="90" x="514" y="340">
    <process expanded="true">
    <operator activated="true" class="loop_collection" compatibility="7.4.000" expanded="true" height="82" name="Output (3)" width="90" x="380" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <process expanded="true">
    <operator activated="false" breakpoints="after" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (2)" width="90" x="112" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <operator activated="true" class="branch" compatibility="7.4.000" expanded="true" height="82" name="Branch (3)" width="90" x="313" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{iteration}==1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall (2)" width="90" x="45" y="187">
    <parameter key="name" value="LoopData"/>
    </operator>
    <operator activated="true" class="union" compatibility="7.4.000" expanded="true" height="82" name="Union (3)" width="90" x="179" y="34"/>
    <connect from_port="condition" to_op="Union (3)" to_port="example set 1"/>
    <connect from_op="Recall (2)" from_port="result" to_op="Union (3)" to_port="example set 2"/>
    <connect from_op="Union (3)" from_port="union" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember (2)" width="90" x="581" y="34">
    <parameter key="name" value="LoopData"/>
    </operator>
    <connect from_port="single" to_op="Branch (3)" to_port="condition"/>
    <connect from_op="Branch (3)" from_port="input 1" to_op="Remember (2)" to_port="store"/>
    <connect from_op="Remember (2)" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select" compatibility="7.4.000" expanded="true" height="68" name="Select (3)" width="90" x="581" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <connect from_port="in 1" to_op="Output (3)" to_port="collection"/>
    <connect from_op="Output (3)" from_port="output 1" to_op="Select (3)" to_port="collection"/>
    <connect from_op="Select (3)" from_port="selected" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log" width="90" x="715" y="340">
    <list key="log">
    <parameter key="Fast Append" value="operator.Fast Performance Union Append.value.execution-time"/>
    <parameter key="Union Append" value="operator.Union Append (3).value.execution-time"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">On small datasets the Union Append is much quicker. On larger datasets the Fast Performance Union is better.</description>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="7.4.000" expanded="true" height="82" name="Reorder Attributes (2)" width="90" x="916" y="238">
    <parameter key="sort_mode" value="alphabetically"/>
    </operator>
    <connect from_op="Example Collection Set" from_port="out 1" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Fast Performance Union Append" to_port="in 1"/>
    <connect from_op="Multiply (2)" from_port="output 2" to_op="Union Append (3)" to_port="in 1"/>
    <connect from_op="Fast Performance Union Append" from_port="out 1" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
    <connect from_op="Union Append (3)" from_port="out 1" to_op="Log" to_port="through 1"/>
    <connect from_op="Log" from_port="through 1" to_op="Reorder Attributes (2)" to_port="example set input"/>
    <connect from_op="Reorder Attributes (2)" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Indeed - busy times here and in general among the RM users :smileyvery-happy:

     

    And thanks for your kind words.  It is funny but the new search - despite being not a very exciting feature on its own - is also one of my favorites in 7.2...

     

    I will forward this message to the engineers to look into Union.  But for now many thanks for the building block and happy mining,

    Ingo

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    just want to second the notion for the ability of the "Union" operator to function like the "Append" operator when following a collection.  I run into this issue all the time.

     

    Thanks!


    Scott

  • bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    Hello @sgenzer

     

    thank you for such a clean solution, Would you be ok if we publish this in our building block sections

    http://community.rapidminer.com/t5/Building-Blocks/bd-p/BB

     

    I'll provide link and credit to you obviously, thank you!!

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    I agree it's a nice clean solution but it's not mine.  :)  It's aruberutou - see above.

     

    Scott

  • 781194025781194025 Member Posts: 32 Contributor I
    OMG THANK YOU SO MUCH!! I HAVE SPENT 10+ HOURS JUST TRYING TO COMBINE DATA SETS THAT BOTH HAVE MISSING VALUES.. WHY IS THIS NOT INCLUDED IN RAPIDMINER!!
Sign In or Register to comment.