RapidMiner

Problem with combining all example set from IO Object Collection

SOLVED
Super Contributor

Problem with combining all example set from IO Object Collection

[ Edited ]

Hello everyone

 

I'm running a loop to create each ExampleSet I end up with an IOObjectCollection on the output. I got a problem with joining all example sets that i got from looping attributes into one example set. i've tried all join operator but im stuck on it. I set attribute "No" as an ID and the value is alike with each other example set.  For example my data are like this.

example set 1 :

No  att1

1

2                          

example set 2 :

No att2

1

2

example set 3 :

No att3

1

2

 

the result that i want is like this

example set :

No att1 att2 att3

1

2

 

i've tried looking for a reference, and i ended up find similiar post like this but still im stuck on it, here is the seimiliar post http://community.rapidminer.com/t5/Original-Rapid-I-Forum/Combining-Example-Set-Attributes/m-p/12879

See more topics labeled with:

22 REPLIES
RM Certified Expert

Re: Problem with combining all example set from IO Object Collection

You can append these all together but first the attributes will need to be renamed so the datset has the same structure (attributes names and data types).  Try the Rename by Generic Names followed by an Append and you should get a resulting dataset that you can then transpose.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Super Contributor

Re: Problem with combining all example set from IO Object Collection

i've tried your recomendation but error appears, it said "duplicate attribute name". I put Rename by Generic Names inside loop attributes operator and append, transpose outside the loop operators

Moderator

Re: Problem with combining all example set from IO Object Collection

I wouldn't put the Rename by Generic into the Loop, I'd do it on the outside of the loop. 

Super Contributor

Re: Problem with combining all example set from IO Object Collection

it comes error too, it said that "your connection is producing worng type data". Maybe, because after the loop, the type of data is IO Object Collection and Rename by Generic name only expect a example set

Highlighted
RMStaff
Solution
Accepted by topic author binsetyawan
‎05-15-2017 09:50 AM

Re: Problem with combining all example set from IO Object Collection

Hi,

 

I have attached an example process and the XML which should solve your problem.

Some key takeaways:

  1. The solution uses the Join operator and Remember / Recall within a Loop Collection.
  2. Joining needs an ID attribute - Either you create one or you use an existing one which can be used ==> Then be sure you use the desired join type
  3. IDs need to have the same Value type (e.g. Numerical). Here the Blending -> Attributes -> Types Operators can help
  4. In order to overcome the problem that you need to have always two ExampleSets for a Join operation I Remember the first one
  5. Each execution of the Loop the Remembered dataset is Recalled, Joined and again Remembered
  6. In the end you receive the final dataset which can be Recalled outside of the Loop Collection

Please keep in mind that Remember / Recall are great operators but I do not recommend to use them when it comes to handling huge datasets.

 

Best,

Edin

 

Here the XML:

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data (2)" width="90" x="45" y="34">
        <parameter key="number_of_attributes" value="1"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="att1"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.5.001" expanded="true" height="124" name="Multiply" width="90" x="313" y="34"/>
      <operator activated="true" class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (4)" width="90" x="447" y="238">
        <parameter key="old_name" value="att1"/>
        <parameter key="new_name" value="att3"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (3)" width="90" x="447" y="136">
        <parameter key="old_name" value="att1"/>
        <parameter key="new_name" value="att2"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" breakpoints="after" class="collect" compatibility="7.5.001" expanded="true" height="124" name="Collect" width="90" x="581" y="34"/>
      <operator activated="true" class="loop_collection" compatibility="7.5.001" expanded="true" height="68" name="Loop Collection (2)" width="90" x="715" y="34">
        <parameter key="set_iteration_macro" value="true"/>
        <process expanded="true">
          <operator activated="true" class="generate_id" compatibility="7.5.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="238"/>
          <operator activated="true" class="branch" compatibility="7.5.001" expanded="true" height="82" name="Branch (2)" width="90" x="514" y="238">
            <parameter key="condition_type" value="expression"/>
            <parameter key="expression" value="%{iteration}==1"/>
            <process expanded="true">
              <operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember (3)" width="90" x="45" y="34">
                <parameter key="name" value="dataset"/>
              </operator>
              <connect from_port="condition" to_op="Remember (3)" to_port="store"/>
              <portSpacing port="source_condition" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_input 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (3)" width="90" x="45" y="34">
                <parameter key="name" value="dataset"/>
              </operator>
              <operator activated="true" class="join" compatibility="7.5.001" expanded="true" height="82" name="Join (2)" width="90" x="179" y="85">
                <parameter key="join_type" value="left"/>
                <list key="key_attributes"/>
              </operator>
              <operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember (4)" width="90" x="313" y="85">
                <parameter key="name" value="dataset"/>
              </operator>
              <connect from_port="condition" to_op="Join (2)" to_port="right"/>
              <connect from_op="Recall (3)" from_port="result" to_op="Join (2)" to_port="left"/>
              <connect from_op="Join (2)" from_port="join" to_op="Remember (4)" to_port="store"/>
              <portSpacing port="source_condition" spacing="105"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_input 1" spacing="0"/>
            </process>
          </operator>
          <connect from_port="single" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Branch (2)" to_port="condition"/>
          <portSpacing port="source_single" spacing="189"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="315" resized="true" width="378" x="63" y="30">Either &lt;br/&gt;- Generate an ID&lt;br/&gt;- Set the Role for an attribute to ID&lt;br/&gt;&lt;br/&gt;Important is that the attribute names in the final exampleset must be unique&lt;br/&gt;&lt;br/&gt;In addition the value type (Numerical vs. Polynominal) of the ID attribute has to be the same for each ExampleSet</description>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (2)" width="90" x="849" y="34">
        <parameter key="name" value="dataset"/>
      </operator>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Collect" to_port="input 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Rename (3)" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Rename (4)" to_port="example set input"/>
      <connect from_op="Rename (4)" from_port="example set output" to_op="Collect" to_port="input 3"/>
      <connect from_op="Rename (3)" from_port="example set output" to_op="Collect" to_port="input 2"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection (2)" to_port="collection"/>
      <connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

 

Attachments

Super Contributor

Re: Problem with combining all example set from IO Object Collection

thank you for the reference of ooperator, the tips and the example too, i'll try it with my model that i built.

*P.S : When i run your example, it still appears object collection with some example sets

 

Regards,

Bintang

Super Contributor

Re: Problem with combining all example set from IO Object Collection

I've looking for another example and i've found a model that similiar with yours and the result is what i looking for. But, when i tried with my model, it appears an error on recall operator inside branch operator, it said that "no object with name X was found during retrieval from the object store", even though i've adjusted with the model.

 

Here is the xml code from the model that i've adjusted to

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.1.000-SNAPSHOT" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Subprocess" width="90" x="112" y="30">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">
            <list key="attribute_values">
                   <parameter key="id" value="1"/>
              <parameter key="col1" value="48"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="45" y="120">
            <list key="attribute_values">
              <parameter key="id" value="2"/>
              <parameter key="col1" value="4"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append" width="90" x="179" y="30"/>
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="45" y="210">
            <list key="attribute_values">
              <parameter key="id" value="1"/>
              <parameter key="col2" value="9"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="45" y="300">
            <list key="attribute_values">
              <parameter key="id" value="2"/>
              <parameter key="col2" value="7"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append (2)" width="90" x="179" y="210"/>
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (5)" width="90" x="45" y="390">
            <list key="attribute_values">
              <parameter key="id" value="1"/>
              <parameter key="col3" value="88"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Generate Data by User Specification (6)" width="90" x="45" y="480">
            <list key="attribute_values">
              <parameter key="id" value="2"/>
              <parameter key="col3" value="78"/>
            </list>
            <list key="set_additional_roles">
              <parameter key="id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Append (3)" width="90" x="179" y="390"/>
          <operator activated="true" class="collect" compatibility="6.1.000-SNAPSHOT" expanded="true" height="112" name="Collect" width="90" x="380" y="210"/>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Append" from_port="merged set" to_op="Collect" to_port="input 1"/>
          <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append (2)" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append (2)" to_port="example set 2"/>
          <connect from_op="Append (2)" from_port="merged set" to_op="Collect" to_port="input 2"/>
          <connect from_op="Generate Data by User Specification (5)" from_port="output" to_op="Append (3)" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (6)" from_port="output" to_op="Append (3)" to_port="example set 2"/>
          <connect from_op="Append (3)" from_port="merged set" to_op="Collect" to_port="input 3"/>
          <connect from_op="Collect" from_port="collection" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="6.1.000-SNAPSHOT" expanded="true" height="94" name="Multiply (2)" width="90" x="246" y="30"/>
      <operator activated="true" class="select" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Select (2)" width="90" x="447" y="30"/>
      <operator activated="true" class="remember" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Remember" width="90" x="581" y="30">
        <parameter key="name" value="1"/>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Loop Collection" width="90" x="447" y="165">
        <parameter key="set_iteration_macro" value="true"/>
        <process expanded="true">
          <operator activated="true" class="branch" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Branch" width="90" x="112" y="120">
            <parameter key="condition_type" value="expression"/>
            <parameter key="condition_value" value="%{iteration}==1"/>
            <process expanded="true">
              <connect from_port="condition" to_port="input 1"/>
              <portSpacing port="source_condition" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_input 1" spacing="0"/>
              <portSpacing port="sink_input 2" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="recall" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Recall" width="90" x="112" y="75">
                <parameter key="name" value="1"/>
              </operator>
              <operator activated="true" class="join" compatibility="6.1.000-SNAPSHOT" expanded="true" height="76" name="Join" width="90" x="246" y="30">
                <list key="key_attributes"/>
              </operator>
              <operator activated="true" class="remember" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Remember (2)" width="90" x="380" y="30">
                <parameter key="name" value="1"/>
              </operator>
              <connect from_port="condition" to_op="Join" to_port="left"/>
              <connect from_op="Recall" from_port="result" to_op="Join" to_port="right"/>
              <connect from_op="Join" from_port="join" to_op="Remember (2)" to_port="store"/>
              <connect from_op="Remember (2)" from_port="stored" to_port="input 1"/>
              <portSpacing port="source_condition" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_input 1" spacing="0"/>
              <portSpacing port="sink_input 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="single" to_op="Branch" to_port="condition"/>
          <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="6.1.000-SNAPSHOT" expanded="true" height="60" name="Recall (2)" width="90" x="581" y="165">
        <parameter key="name" value="1"/>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Select (2)" to_port="collection"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Select (2)" from_port="selected" to_op="Remember" to_port="store"/>
      <connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
RMStaff

Re: Problem with combining all example set from IO Object Collection

I used the XML you posted in my RapidMiner (v 7.5.001) and it worked perfectly.

Did I miss something?

 

Best,

Edin

Super Contributor

Re: Problem with combining all example set from IO Object Collection

when i run your xml code, it appears IO Object Collection with 3 example sets that not yet joined into one example set. Therefore im looking for another reference and then i found other xml code (on my previous reply)