IOObjectCollection to ExampleSet after loop

cbaslancbaslan Member Posts: 6 Contributor II
edited November 2018 in Help

Hi all,

 

In my project I have used a loop operator and naturally it resulted in a IOObjectCollection. But this collection has some ExampleSets that are empty. Therefore I can't use Append process to get them together. Is there a way to append them somehow? Because it has 258 ExampleSets and manually doing it is a nightmare. I just need the ones with information in it. 

 

Thanks

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Can you use a Union or another operator to make an entire set?

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Hi Cbaslan,

     

    What kind of empty set do you have? Is there any header? If possible, you can try to put a 'Branch' with condition type on the min_examples. If there is at lease 1 example in the data set, you can keep the data, otherwise discard.

     

    One question for you, can you garantee that all the non-empty exmaple sets have the exactly same headers before we append them? If no issues, you can drop the empty set using Branch and then directly append the resulting collection of objects later.

     

    Here is an example process, 

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf-Testset" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Golf (2)" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" breakpoints="after" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples: get some dummy empty data" width="90" x="179" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Temperature.gt.100"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Filter Examples: get some dummy empty data</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.3.000" expanded="true" height="124" name="Subprocess" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="collect" compatibility="7.3.000" expanded="true" height="124" name="Collect" width="90" x="45" y="34"/>
    <operator activated="true" breakpoints="after" class="loop_collection" compatibility="7.3.000" expanded="true" height="82" name="Loop Collection" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="branch" compatibility="7.3.000" expanded="true" height="82" name="Branch" width="90" x="246" y="34">
    <parameter key="condition_type" value="min_examples"/>
    <parameter key="condition_value" value="1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="10" y="10">Keep the data if non-empty</description>
    </process>
    <process expanded="true">
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="43" y="10">Discard the empty data set</description>
    </process>
    </operator>
    <connect from_port="single" to_op="Branch" to_port="condition"/>
    <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.3.000" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
    <connect from_port="in 1" to_op="Collect" to_port="input 1"/>
    <connect from_port="in 2" to_op="Collect" to_port="input 2"/>
    <connect from_port="in 3" to_op="Collect" to_port="input 3"/>
    <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="source_in 4" spacing="144"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="72"/>
    </process>
    </operator>
    <connect from_op="Golf" from_port="output" to_op="Subprocess" to_port="in 1"/>
    <connect from_op="Golf-Testset" from_port="output" to_op="Subprocess" to_port="in 2"/>
    <connect from_op="Golf (2)" from_port="output" to_op="Filter Examples: get some dummy empty data" to_port="example set input"/>
    <connect from_op="Filter Examples: get some dummy empty data" from_port="example set output" to_op="Subprocess" to_port="in 3"/>
    <connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    HTH,

    YY

  • cbaslancbaslan Member Posts: 6 Contributor II

    Thanks all. I have tried union but no luck there. Actually second answer seems to be the one. Because they have the same header. But here is a newbie question, I cant find how can I use this xml in the new version of RM.

  • kaymankayman Member Posts: 662 Unicorn

    view -> show panel -> XML

  • bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    The append operator is able to handle exampleset with no rows, abviously you will still have the headers (column names, data types etc)

     

    May be your problem is different number or type of columns?
    Append operator expects exact same columns in the sets that you are trying to append

Sign In or Register to comment.