[SOLVED] Selecting attribute subset specified in a "meta" exampleset

tennenrishintennenrishin Member Posts: 177 Contributor II
Hi

Suppose we have examplesetA with a (design-time-unknown) set of attributes:
E.g.
a b c d e <-- attributes
1 2 3 4 5 <-- example 1
1 2 3 4 5 <-- example 2

We also have examplesetB holding (design-time-unknown) examples:
E.g.
X <-- attribute
a <-- example 1
d <-- example 2
e <-- example 3

The output that is needed is
a d e
1 4 5
1 4 5

The examplesetB specifies how we want to select attributes in examplesetA.

What is the best way to do this in RM?
(
Should I create a regex by looping over the examples in B?
Should I select the attributes one at a time by looping over the examples in B, and join them all together in that loop?
Is there another simpler way that I'm not thinking about?
)

Thanks for any help.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    I often go for the first approach of creating a regex from the second example set.

    But if your actual data does not contain missings, you could also define attribute X in Example Set B as Id, transpose it, remove all attributes (with Filter Example Range), use the Union operator to combine it with examplesetA and then remove all attributes which contain missing values via Filter Attributes.

    Good luck!
    ~Marius
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    That's clever! I'll try it out and report back.

    Thanks!
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    I don't think I fully understood.

    So this is what I ended up with
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
       <process expanded="true" height="641" width="1040">
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="112" y="75">
           <list key="attribute_values">
             <parameter key="a" value="1"/>
             <parameter key="b" value="2"/>
             <parameter key="c" value="3"/>
             <parameter key="d" value="4"/>
             <parameter key="e" value="5"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="165">
           <list key="attribute_values">
             <parameter key="a" value="1"/>
             <parameter key="b" value="2"/>
             <parameter key="c" value="3"/>
             <parameter key="d" value="4"/>
             <parameter key="e" value="5"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.2.008" expanded="true" height="94" name="examplesetA" width="90" x="313" y="120"/>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="a" width="90" x="112" y="300">
           <list key="attribute_values">
             <parameter key="X" value="&quot;a&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="d" width="90" x="112" y="390">
           <list key="attribute_values">
             <parameter key="X" value="&quot;d&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="e" width="90" x="112" y="480">
           <list key="attribute_values">
             <parameter key="X" value="&quot;e&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.2.008" expanded="true" height="112" name="examplesetB" width="90" x="313" y="390"/>
         <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="345">
           <parameter key="name" value="X"/>
           <parameter key="target_role" value="id"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="transpose" compatibility="5.2.008" expanded="true" height="76" name="Transpose" width="90" x="581" y="345"/>
         <operator activated="true" class="filter_example_range" compatibility="5.2.008" expanded="true" height="76" name="Filter Example Range" width="90" x="715" y="345">
           <parameter key="first_example" value="1"/>
           <parameter key="last_example" value="1"/>
         </operator>
         <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="849" y="255"/>
         <connect from_op="Generate Data by User Specification" from_port="output" to_op="examplesetA" to_port="example set 1"/>
         <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="examplesetA" to_port="example set 2"/>
         <connect from_op="examplesetA" from_port="merged set" to_op="Union" to_port="example set 1"/>
         <connect from_op="a" from_port="output" to_op="examplesetB" to_port="example set 1"/>
         <connect from_op="d" from_port="output" to_op="examplesetB" to_port="example set 2"/>
         <connect from_op="e" from_port="output" to_op="examplesetB" to_port="example set 3"/>
         <connect from_op="examplesetB" from_port="merged set" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Transpose" to_port="example set input"/>
         <connect from_op="Transpose" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
         <connect from_op="Filter Example Range" from_port="example set output" to_op="Union" to_port="example set 2"/>
         <connect from_op="Union" from_port="union" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="252"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Can you elaborate a bit more on this part:
    remove all attributes (with Filter Example Range)
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    This modification works, but it's rather clumsy.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
       <process expanded="true" height="641" width="1036">
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="112" y="75">
           <list key="attribute_values">
             <parameter key="a" value="1"/>
             <parameter key="b" value="2"/>
             <parameter key="c" value="3"/>
             <parameter key="d" value="4"/>
             <parameter key="e" value="5"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="165">
           <list key="attribute_values">
             <parameter key="a" value="1"/>
             <parameter key="b" value="2"/>
             <parameter key="c" value="3"/>
             <parameter key="d" value="4"/>
             <parameter key="e" value="5"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.2.008" expanded="true" height="94" name="examplesetA" width="90" x="313" y="120"/>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="a" width="90" x="112" y="300">
           <list key="attribute_values">
             <parameter key="X" value="&quot;a&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="d" width="90" x="112" y="390">
           <list key="attribute_values">
             <parameter key="X" value="&quot;d&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="e" width="90" x="112" y="480">
           <list key="attribute_values">
             <parameter key="X" value="&quot;e&quot;"/>
           </list>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.2.008" expanded="true" height="112" name="examplesetB" width="90" x="246" y="390"/>
         <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="345">
           <list key="function_descriptions">
             <parameter key="selected" value="1"/>
           </list>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="514" y="345">
           <parameter key="name" value="X"/>
           <parameter key="target_role" value="id"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="transpose" compatibility="5.2.008" expanded="true" height="76" name="Transpose" width="90" x="648" y="345"/>
         <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="447" y="165"/>
         <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes (2)" width="90" x="581" y="165">
           <parameter key="attribute_filter_type" value="no_missing_values"/>
           <parameter key="numeric_condition" value="&quot;&gt;=0&quot;"/>
         </operator>
         <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="715" y="165">
           <list key="function_descriptions">
             <parameter key="missingId" value="missing(id)!=false"/>
           </list>
         </operator>
         <operator activated="true" class="filter_examples" compatibility="5.2.008" expanded="true" height="76" name="Filter Examples" width="90" x="849" y="165">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="missingId=1"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="849" y="345">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attribute" value="id"/>
           <parameter key="attributes" value="|id|missingId"/>
           <parameter key="invert_selection" value="true"/>
           <parameter key="include_special_attributes" value="true"/>
         </operator>
         <connect from_op="Generate Data by User Specification" from_port="output" to_op="examplesetA" to_port="example set 1"/>
         <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="examplesetA" to_port="example set 2"/>
         <connect from_op="examplesetA" from_port="merged set" to_op="Union" to_port="example set 1"/>
         <connect from_op="a" from_port="output" to_op="examplesetB" to_port="example set 1"/>
         <connect from_op="d" from_port="output" to_op="examplesetB" to_port="example set 2"/>
         <connect from_op="e" from_port="output" to_op="examplesetB" to_port="example set 3"/>
         <connect from_op="examplesetB" from_port="merged set" to_op="Generate Attributes" to_port="example set input"/>
         <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Transpose" to_port="example set input"/>
         <connect from_op="Transpose" from_port="example set output" to_op="Union" to_port="example set 2"/>
         <connect from_op="Union" from_port="union" to_op="Select Attributes (2)" to_port="example set input"/>
         <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
         <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="252"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Unless there is a way to simplify this a lot, and to accommodate missing values in the data set, I guess I'll have to go with one of the other approaches.

    Some kind of operator that can do attribute set intersections could be handy sometimes, maybe.
Sign In or Register to comment.