Options

Combining IOCollection into one ExampleSet [SOLVED]

MickeyMickey Member Posts: 6 Contributor II
edited November 2018 in Help
Hello, I'm new to RapidMiner, and I'm using RM 5 with it's excellent GUI.

I'm stuck with something simple but I can't figure out how to solve it. I want to combine collected data sets (they are collected into one IOCollection) into one example set (which I can then use for sorting). I can't figure out how to do it! How can I do that?

More details:
I have an attribute denoting the group that each example belongs to (there are around 300 groups).
I used a value loop and a nested example filter so that I can do some analysis on each group separately (adding new attributes such as "outlier").
Now I want to combine these results into one exampleset. How do I do that?

Answers

  • Options
    SebastianLohSebastianLoh Member Posts: 99 Contributor II
    Hi Mickey,

    could you post you process please?

    Ciao Sebastian

    P.S. Copy and past the xml code of the process from rapidminer. Use the "insert code" button (#) in the forum editor to post code.
  • Options
    MickeyMickey Member Posts: 6 Contributor II
    Sorry for the delay!

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="581" width="681">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//SA/2009-12-06 sampled 6 PCA"/>
          </operator>
          <operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID" width="90" x="313" y="165">
            <parameter key="create_nominal_ids" value="true"/>
          </operator>
          <operator activated="true" class="loop_values" expanded="true" height="76" name="Loop Values" width="90" x="447" y="165">
            <parameter key="attribute" value="Computer"/>
            <process expanded="true" height="583" width="663">
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="112" y="75">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="Computer=%{loop_value}"/>
              </operator>
              <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="246" y="75">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="PC6|PC5|PC4|PC3|PC2|PC1"/>
              </operator>
              <operator activated="true" class="detect_outlier_lof" expanded="true" height="76" name="Detect Outlier (2)" width="90" x="380" y="30"/>
              <operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="514" y="75"/>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Detect Outlier (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="original" to_op="Join" to_port="right"/>
              <connect from_op="Detect Outlier (2)" from_port="example set output" to_op="Join" to_port="left"/>
              <connect from_op="Join" from_port="join" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    As you can see the loop value output is directly connected to the result ("out 1" to "result 1"). It is a collection (marked as a double line in the GUI) of example sets with identical attributes (of course, not identical VALUES). I'd like to combine all example sets in the collection into one example set (i.e one table), possibly simply by "pasting the lines" one after another.
  • Options
    TerjeTerje Member Posts: 2 Contributor I
    I'd expect to be able to use Flatten Collection to turn an IOCollection into an ExampleSet. Unfortunately it doesn't work (yet?) and lacks documentation that could confirm its intended functionality.

    The Append operator does turn an IOCollection into an ExampleSet, used with a single input. Unfortunately, it is also very slow.

    Ideally I'd like Rapidminer operators to transparently handle the conversion with no need to explicitly convert from IOCollection to ExampleSet.
  • Options
    MickeyMickey Member Posts: 6 Contributor II
    Thank you! "Append" operator seems to do exactly what I wanted!
    Neither the description nor GUI shape of "Append" suggest mention that it supports collections, which is why I never found it :(
    I agree about "Flatten Collection"  - this is the first thing I tried.

  • Options
    TerjeTerje Member Posts: 2 Contributor I
    I still wonder, what's the "official" recommended way to turn Collections into ExampleSets? Append is too slow for me. I have tried using Flatten Collection followed by a Select, but that didn't do the trick (in providing an ExampleSet from the Collections coming from a Loop Values operation).
Sign In or Register to comment.