Options

[SOLVED] Joining ExampleSets of a collection

tennenrishintennenrishin Member Posts: 177 Contributor II
edited November 2018 in Help
Is there any way to perform a join of all ExampleSets in a collection of ExampleSets?

Answers

  • Options
    tennenrishintennenrishin Member Posts: 177 Contributor II
    Here is a solution that uses Remember/Recall operators. It feels like a hack. Does anyone have a better idea?
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
       <process expanded="true" height="479" width="924">
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.006" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="75">
           <list key="attribute_values"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_id" compatibility="5.2.006" expanded="true" height="76" name="Generate ID" width="90" x="179" y="75"/>
         <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="75">
           <parameter key="invert_filter" value="true"/>
         </operator>
         <operator activated="true" class="remember" compatibility="5.2.006" expanded="true" height="60" name="Remember (3)" width="90" x="447" y="75">
           <parameter key="name" value="partial_join"/>
           <parameter key="io_object" value="ExampleSet"/>
         </operator>
         <operator activated="true" class="loop_collection" compatibility="5.2.006" expanded="true" height="60" name="Loop Collection" width="90" x="581" y="30">
           <process expanded="true" height="479" width="924">
             <operator activated="true" class="recall" compatibility="5.2.006" expanded="true" height="60" name="Recall" width="90" x="112" y="75">
               <parameter key="name" value="partial_join"/>
               <parameter key="io_object" value="ExampleSet"/>
             </operator>
             <operator activated="true" class="join" compatibility="5.2.006" expanded="true" height="76" name="Join" width="90" x="246" y="30">
               <parameter key="join_type" value="outer"/>
               <list key="key_attributes"/>
             </operator>
             <operator activated="true" class="remember" compatibility="5.2.006" expanded="true" height="60" name="Remember" width="90" x="380" y="30">
               <parameter key="name" value="partial_join"/>
               <parameter key="io_object" value="ExampleSet"/>
             </operator>
             <connect from_port="single" to_op="Join" to_port="left"/>
             <connect from_op="Recall" from_port="result" to_op="Join" to_port="right"/>
             <connect from_op="Join" from_port="join" to_op="Remember" to_port="store"/>
             <portSpacing port="source_single" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="recall" compatibility="5.2.006" expanded="true" height="60" name="Recall (2)" width="90" x="720" y="30">
           <parameter key="name" value="partial_join"/>
           <parameter key="io_object" value="ExampleSet"/>
         </operator>
         <connect from_port="input 1" to_op="Loop Collection" to_port="collection"/>
         <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_op="Remember (3)" to_port="store"/>
         <connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="source_input 2" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, that's a way to go.
  • Options
    baglivofabriciobaglivofabricio Member Posts: 2 Contributor I

    Hi! I have tried this method but i get a potential problem detected. The right input of the JOIN operation that comes from he RECALL method throw that meta data is underspecified and cannot check precondition. Does anyone had that issue? I am looking for a way to solve it but i cannot find it.

     

    Best!

  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="7.6.001" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="34">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="50"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris (2)" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="7.6.001" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="179" y="136">
    <parameter key="first_example" value="20"/>
    <parameter key="last_example" value="70"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris (3)" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="7.6.001" expanded="true" height="82" name="Filter Example Range (3)" width="90" x="179" y="238">
    <parameter key="first_example" value="30"/>
    <parameter key="last_example" value="80"/>
    </operator>
    <operator activated="true" class="collect" compatibility="7.6.001" expanded="true" height="124" name="Collect" width="90" x="313" y="34"/>
    <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
    <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="136"/>
    <operator activated="true" class="remove_duplicates" compatibility="7.6.001" expanded="true" height="103" name="Remove Duplicates" width="90" x="648" y="34"/>
    <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="581" y="238">
    <list key="aggregation_attributes">
    <parameter key="id" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="id"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.6.001" expanded="true" height="68" name="Extract Macro" width="90" x="715" y="238">
    <parameter key="macro" value="maxCount"/>
    <parameter key="macro_type" value="statistics"/>
    <parameter key="statistics" value="max"/>
    <parameter key="attribute_name" value="count(id)"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="849" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="count(id).eq.%{maxCount}"/>
    </list>
    </operator>
    <operator activated="true" class="join" compatibility="7.6.001" expanded="true" height="82" name="Join" width="90" x="849" y="34">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Collect" to_port="input 1"/>
    <connect from_op="Retrieve Iris (2)" from_port="output" to_op="Filter Example Range (2)" to_port="example set input"/>
    <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Collect" to_port="input 2"/>
    <connect from_op="Retrieve Iris (3)" from_port="output" to_op="Filter Example Range (3)" to_port="example set input"/>
    <connect from_op="Filter Example Range (3)" from_port="example set output" to_op="Collect" to_port="input 3"/>
    <connect from_op="Collect" from_port="collection" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Remove Duplicates" to_port="example set input"/>
    <connect from_op="Remove Duplicates" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Well, this solution doesn't use remember/recall operators. I don't know if it is any simpler though. In any case it has been a great RM exercise!

     

    Best,

    Sebastian

  • Options
    baglivofabriciobaglivofabricio Member Posts: 2 Contributor I

    Thanks SGolbert!

     

    I will try it, best!

     

    Fabricio

Sign In or Register to comment.