How to automatically select a subset of attributes which names have been varied

qwertzqwertz Member Posts: 130 Contributor II
edited November 2018 in Help

Dear all,

I am about to reduce manual steps in my current process. But now I came to a point where I got stuck.

The original example set looks like this:
att1 att2 att3
1 3 6
2 4 7
3 5 8
4 6 9

After the first steps of processing (I do windowing) I receive a new set with more attributes:
label att1-0 att1-1 att2-0 att2-1 att3-0 att3-1

Then I want to split into "subsets" and this is where I struggle. Each subset shall include the label and all variations of each original attribute:
First iteration label, att1-0 and att1-1
Second iteration label, att2-0 and att2-1
Third iteration label, att3-0 and att3-1

My intention is to run a selection algorithm with each subset next and finally merge the remaining attributes in a single set again:
e.g. label, att1-0, att2-1, att3-0 (attributes according selection algorithm)


I already tried operators like "loop attributes" and  "work on subset" as well as combinations of both. But in the end I was not able to make it so far. Could anyone please provide a clue which approach could be promising?


Kind regards
Sachs

PS: The names and quantity of the attributes may change every time I run the process...

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Sachs,

    indeed not a trivial problem. The trick is to apply the windowing before a Loop Attributes, and loop the *original* attributes in the operator. Then, inside the loop, you can use a regular expression to select all window attributes that originate from the current attribute.

    The problem is, that Loop Attributes has only one input port, and obviously to that port the original attributes are connected. That means that we have to pass the windowed into the loop with a Remember/Recall combination.

    Once that is done, two Select Attribute operators are used to split the attribute set into attributes of interest and all other attributes. Do your attribute selection on the interesting ones, and the use Join to join it together with the other attributes.

    In the example process below, you can do your stuff in the subprocess called "Do something".

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="464" width="792">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="120"/>
          <operator activated="true" class="generate_id" compatibility="5.3.000" expanded="true" height="76" name="Generate ID" width="90" x="179" y="120"/>
          <operator activated="true" class="series:windowing" compatibility="5.2.001" expanded="true" height="76" name="Windowing" width="90" x="313" y="120">
            <parameter key="window_size" value="3"/>
          </operator>
          <operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember" width="90" x="447" y="30">
            <parameter key="name" value="data"/>
            <parameter key="io_object" value="ExampleSet"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.3.000" expanded="true" height="76" name="Loop Attributes" width="90" x="581" y="120">
            <process expanded="true" height="464" width="815">
              <operator activated="true" class="recall" compatibility="5.3.000" expanded="true" height="60" name="Recall" width="90" x="45" y="30">
                <parameter key="name" value="data"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="remove_from_store" value="false"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.3.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="%{loop_attribute}-.*"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.3.000" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="165">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="%{loop_attribute}-.*"/>
                <parameter key="invert_selection" value="true"/>
              </operator>
              <operator activated="true" class="subprocess" compatibility="5.3.000" expanded="true" height="76" name="Do Something" width="90" x="313" y="30">
                <process expanded="true" height="464" width="792">
                  <operator activated="false" class="select_attributes" compatibility="5.3.000" expanded="true" height="76" name="Select Attributes (3)" width="90" x="179" y="75">
                    <parameter key="attribute_filter_type" value="single"/>
                    <parameter key="attribute" value="att2-1"/>
                    <parameter key="invert_selection" value="true"/>
                  </operator>
                  <connect from_port="in 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="join" compatibility="5.3.000" expanded="true" height="76" name="Join" width="90" x="447" y="30">
                <list key="key_attributes"/>
              </operator>
              <operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="30">
                <parameter key="name" value="data"/>
                <parameter key="io_object" value="ExampleSet"/>
              </operator>
              <connect from_op="Recall" from_port="result" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Do Something" to_port="in 1"/>
              <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
              <connect from_op="Do Something" from_port="out 1" to_op="Join" to_port="left"/>
              <connect from_op="Join" from_port="join" to_op="Remember (2)" to_port="store"/>
              <connect from_op="Remember (2)" from_port="stored" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_op="Remember" to_port="store"/>
          <connect from_op="Windowing" from_port="original" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • qwertzqwertz Member Posts: 130 Contributor II

    Hi Marius,

    thank you for this example. That looks really advanced!

    Just to share some ideas: Here is some code I meanwhile developed but it doesn't work yet.
    The main problem is that the loop operator not only does an operation x times but it also provides a result after each operation. That way I receive a collection instead of a single data set. (And in case that I use the "append" operator all the examples are being copied / duplicated.)

    So wouldn't it be handy to have a kind of "loop" operator which only repeats an operation several times AND which provides a single data set in the end? Or have a kind of iteration macro in the "work on subset" operator.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="557" width="902">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="3"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.2.008" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="30">
            <process expanded="true" height="540" width="622">
              <operator activated="true" class="provide_macro_as_log_value" compatibility="5.2.008" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="45" y="30">
                <parameter key="macro_name" value="loop_attribute"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="179" y="30">
                <list key="log">
                  <parameter key="cn" value="operator.Provide Macro as Log Value.value.macro_value"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Provide Macro as Log Value" to_port="through 1"/>
              <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.2.008" expanded="true" height="60" name="Extract Macro (2)" width="90" x="313" y="30">
            <parameter key="macro" value="int"/>
            <parameter key="macro_type" value="number_of_attributes"/>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="447" y="30">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="2"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="att1"/>
          </operator>
          <operator activated="true" class="log_to_data" compatibility="5.2.008" expanded="true" height="94" name="Log to Data" width="90" x="581" y="30">
            <parameter key="log_name" value="Log"/>
          </operator>
          <operator activated="true" class="loop" compatibility="5.2.008" expanded="true" height="94" name="Loop" width="90" x="715" y="30">
            <parameter key="set_iteration_macro" value="true"/>
            <parameter key="iterations" value="%{int}"/>
            <process expanded="true" height="557" width="622">
              <operator activated="true" class="extract_macro" compatibility="5.2.008" expanded="true" height="60" name="Extract Macro" width="90" x="112" y="30">
                <parameter key="macro" value="att_name"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="cn"/>
                <parameter key="example_index" value="%{iteration}"/>
              </operator>
              <operator activated="true" class="work_on_subset" compatibility="5.2.008" expanded="true" height="76" name="Work on Subset" width="90" x="179" y="120">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="%{att_name}.*"/>
                <parameter key="keep_subset_only" value="true"/>
                <process expanded="true" height="557" width="622">
                  <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="30">
                    <parameter key="attribute_filter_type" value="single"/>
                    <parameter key="attribute" value="%{att_name}-0"/>
                  </operator>
                  <operator activated="false" class="weight_by_correlation" compatibility="5.2.008" expanded="true" height="76" name="Weight by Correlation" width="90" x="112" y="165">
                    <parameter key="normalize_weights" value="false"/>
                  </operator>
                  <operator activated="false" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights" width="90" x="246" y="165">
                    <parameter key="weight_relation" value="top k"/>
                    <parameter key="k" value="3"/>
                  </operator>
                  <connect from_port="exampleSet" to_op="Select Attributes" to_port="example set input"/>
                  <connect from_op="Select Attributes" from_port="example set output" to_port="example set"/>
                  <connect from_op="Weight by Correlation" from_port="weights" to_op="Select by Weights" to_port="weights"/>
                  <connect from_op="Weight by Correlation" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
                  <portSpacing port="source_exampleSet" spacing="0"/>
                  <portSpacing port="sink_example set" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
              <connect from_port="input 2" to_op="Work on Subset" to_port="example set"/>
              <connect from_op="Work on Subset" from_port="example set" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="source_input 3" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="90"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
          <connect from_op="Extract Macro (2)" from_port="example set" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_op="Loop" to_port="input 1"/>
          <connect from_op="Log to Data" from_port="through 1" to_op="Loop" to_port="input 2"/>
          <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Kind regards
    Sachs
Sign In or Register to comment.