Reuse results of loop while removing attributes

qwertz2qwertz2 Member Posts: 49 Guru
edited November 2018 in Help

 

Dear all,

 

Here is a feature proposal I would like to discuss with you: I just came across a process where I wanted to remove attributes within a loop. This could be done using "loop attributes" and "select attributes" as its nested operator.

 

However, I discovered that the loop will be carried out as many times as attributes were in the set at the beginning of the loop. In my case I remove attributes while looping and reuse the result. But still the loop goes over attributes which aren't there anymore after removal.

 

Feature proposal: Have the option to adapt loop iterations to existent attributes only when reuse of result is activated.

 

This sample code removes att2 in the first loop. However, in the console is shown that there are still three iterations for att1, att2, att3.

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
<parameter key="number_examples" value="25"/>
<parameter key="number_of_attributes" value="3"/>
</operator>
<operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="179" y="34">
<parameter key="reuse_results" value="true"/>
<process expanded="true">
<operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="att2.*"/>
<parameter key="except_regular_expression" value="%{loop_attribute}"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="print_to_console" compatibility="7.5.001" expanded="true" height="82" name="Print to Console" width="90" x="179" y="34">
<parameter key="log_value" value="%{loop_attribute}"/>
</operator>
<connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Print to Console" to_port="through 1"/>
<connect from_op="Print to Console" from_port="through 1" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Cheers

Sachs

Best Answer

  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist
    Solution Accepted

    Hi Sachs,

     

    challenge accepted :)

    I added a few Operators to your process and hope the result fits your needs.

    The macro %{a} represents the actual number of executions this Operator has.

     

    Best,

    Edin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data" width="90" x="112" y="34">
    <parameter key="number_examples" value="25"/>
    <parameter key="number_of_attributes" value="3"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="246" y="34">
    <parameter key="window_size" value="5"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="7.5.001" expanded="true" height="82" name="Generate ID" width="90" x="112" y="136"/>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="246" y="136">
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="generate_macro" compatibility="7.5.001" expanded="true" height="82" name="Generate Macro" width="90" x="45" y="34">
    <list key="function_descriptions">
    <parameter key="expression" value="cut(%{loop_attribute},0,index(%{loop_attribute},&quot;-&quot;))"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="%{expression}.*"/>
    <parameter key="use_except_expression" value="true"/>
    <parameter key="except_regular_expression" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="except_regular_expression" value="%{loop_attribute}"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="447" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    <parameter key="except_regular_expression" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="branch" compatibility="7.5.001" expanded="true" height="82" name="Branch" width="90" x="447" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{a}==1"/>
    <process expanded="true">
    <operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember" width="90" x="45" y="34">
    <parameter key="name" value="dataset"/>
    </operator>
    <connect from_port="condition" to_op="Remember" to_port="store"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall" width="90" x="45" y="34">
    <parameter key="name" value="dataset"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.5.001" expanded="true" height="82" name="Join" width="90" x="179" y="85">
    <parameter key="join_type" value="left"/>
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember (2)" width="90" x="313" y="34">
    <parameter key="name" value="dataset"/>
    </operator>
    <connect from_port="condition" to_op="Join" to_port="right"/>
    <connect from_op="Recall" from_port="result" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Remember (2)" to_port="store"/>
    <portSpacing port="source_condition" spacing="63"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    </process>
    </operator>
    <connect from_port="input 1" to_op="Generate Macro" to_port="through 1"/>
    <connect from_op="Generate Macro" from_port="through 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Branch" to_port="condition"/>
    <connect from_op="Select Attributes (2)" from_port="original" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (2)" width="90" x="380" y="136">
    <parameter key="name" value="dataset"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (4)" width="90" x="514" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="id"/>
    <parameter key="except_regular_expression" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Recall (2)" from_port="result" to_op="Select Attributes (4)" to_port="example set input"/>
    <connect from_op="Select Attributes (4)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • qwertz2qwertz2 Member Posts: 49 Guru

    For those of you who want to know more on the background / use case:

     

    Appearingly a simple question but in detail this one drove me mad: I am looking for a regular expression which takes a string out of a macro (e.g. "att1-a") as a reverence value.

    All attribute names shall be selected that have the same prefix before the "-" (--> prefix = "att1") but NOT if the complete reverence value is identical to the attribute's name.


    att1-a --> no match because list entry identical to reference
    att1-b --> match because prefix is the same
    att1-c --> match because prefix is the same
    att2-a --> no match because prefix is different
    att2-b --> no match because prefix is different
    att2-c --> no match because prefix is different

     

     

    I came close to the desired result but in the end the looping over already removed attributes lead to an empty set in the end.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_examples" value="25"/>
    <parameter key="number_of_attributes" value="3"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="179" y="34">
    <parameter key="window_size" value="5"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34">
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="generate_macro" compatibility="7.5.001" expanded="true" height="82" name="Generate Macro" width="90" x="45" y="34">
    <list key="function_descriptions">
    <parameter key="expression" value="cut(%{loop_attribute},0,index(%{loop_attribute},&quot;-&quot;))"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="%{expression}.*"/>
    <parameter key="use_except_expression" value="true"/>
    <parameter key="except_regular_expression" value="%{loop_attribute}"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <connect from_port="input 1" to_op="Generate Macro" to_port="through 1"/>
    <connect from_op="Generate Macro" from_port="through 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Thanks for the idea! I've passed this onto the Developers!

  • qwertz2qwertz2 Member Posts: 49 Guru

     

    Hi Edin,

     

    That's a pretty impressive piece of code! Congratulations!

     

    Best regards

    Sachs

Sign In or Register to comment.