RapidMiner

IOObjectCollection to ExampleSet after loop

Learner I cbaslan
Learner I

IOObjectCollection to ExampleSet after loop

Hi all,

 

In my project I have used a loop operator and naturally it resulted in a IOObjectCollection. But this collection has some ExampleSets that are empty. Therefore I can't use Append process to get them together. Is there a way to append them somehow? Because it has 258 ExampleSets and manually doing it is a nightmare. I just need the ones with information in it. 

 

Thanks

5 REPLIES
RM Certified Expert
RM Certified Expert

Re: IOObjectCollection to ExampleSet after loop

Can you use a Union or another operator to make an entire set?

RM Staff
RM Staff

Re: IOObjectCollection to ExampleSet after loop

Hi Cbaslan,

 

What kind of empty set do you have? Is there any header? If possible, you can try to put a 'Branch' with condition type on the min_examples. If there is at lease 1 example in the data set, you can keep the data, otherwise discard.

 

One question for you, can you garantee that all the non-empty exmaple sets have the exactly same headers before we append them? If no issues, you can drop the empty set using Branch and then directly append the resulting collection of objects later.

 

Here is an example process, 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf-Testset" width="90" x="45" y="136">
        <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf (2)" width="90" x="45" y="238">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" breakpoints="after" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples: get some dummy empty data" width="90" x="179" y="238">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Temperature.gt.100"/>
        </list>
        <description align="center" color="transparent" colored="false" width="126">Filter Examples: get some dummy empty data</description>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.3.000" expanded="true" height="124" name="Subprocess" width="90" x="514" y="34">
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="collect" compatibility="7.3.000" expanded="true" height="124" name="Collect" width="90" x="45" y="34"/>
          <operator activated="true" breakpoints="after" class="loop_collection" compatibility="7.3.000" expanded="true" height="82" name="Loop Collection" width="90" x="179" y="34">
            <process expanded="true">
              <operator activated="true" class="branch" compatibility="7.3.000" expanded="true" height="82" name="Branch" width="90" x="246" y="34">
                <parameter key="condition_type" value="min_examples"/>
                <parameter key="condition_value" value="1"/>
                <process expanded="true">
                  <connect from_port="condition" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                  <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="10" y="10">Keep the data if non-empty</description>
                </process>
                <process expanded="true">
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                  <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="43" y="10">Discard the empty data set</description>
                </process>
              </operator>
              <connect from_port="single" to_op="Branch" to_port="condition"/>
              <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="7.3.000" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
          <connect from_port="in 1" to_op="Collect" to_port="input 1"/>
          <connect from_port="in 2" to_op="Collect" to_port="input 2"/>
          <connect from_port="in 3" to_op="Collect" to_port="input 3"/>
          <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="source_in 4" spacing="144"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="72"/>
        </process>
      </operator>
      <connect from_op="Golf" from_port="output" to_op="Subprocess" to_port="in 1"/>
      <connect from_op="Golf-Testset" from_port="output" to_op="Subprocess" to_port="in 2"/>
      <connect from_op="Golf (2)" from_port="output" to_op="Filter Examples: get some dummy empty data" to_port="example set input"/>
      <connect from_op="Filter Examples: get some dummy empty data" from_port="example set output" to_op="Subprocess" to_port="in 3"/>
      <connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

HTH,

YY

Learner I cbaslan
Learner I

Re: IOObjectCollection to ExampleSet after loop

Thanks all. I have tried union but no luck there. Actually second answer seems to be the one. Because they have the same header. But here is a newbie question, I cant find how can I use this xml in the new version of RM.

Maven
Maven

Re: IOObjectCollection to ExampleSet after loop

view -> show panel -> XML

Highlighted

Re: IOObjectCollection to ExampleSet after loop

The append operator is able to handle exampleset with no rows, abviously you will still have the headers (column names, data types etc)

 

May be your problem is different number or type of columns?
Append operator expects exact same columns in the sets that you are trying to append