The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

"[SOLVED] Passing variables in nested loops"

scepxkoscepxko Member Posts: 15 Maven
edited June 2019 in Help
Hello,

I would like to filter rows of a table using a blacklist

The structure of the blacklist is one column containing the name and a list of numbers ie:

NAME
123
562
588

The structure of the table to be filtered is three columns.
The first column contains numbers, the second and third have various content, ie:

A-----B-----C
123   text   text
123   text   text
562   text   text
455   text   text


During the process, the rows starting with "123" and "562" will be removed since these numbers are in the blacklist.

After a successful process, the result should be:

A-----B-----C
455   text   text


There are logically 2 loops involved: pick one blacklisted number, check if present in each row the table, take action if positive, then pick next blacklisted number.


The direct use of the operator "Filtering examples" did work without any problem when I tested using no loop for the blacklist (ie: parameter string: %{loop_attribute}=123).

When I implemented the loop for the blacklist (using loop_attribute2) (operator "Filtering examples" with parameter string %{loop_attribute}=%{loop_attribute2}), nothing worked.

I tried an alternative solution I read in another post, using the operator "Generate Attributes".

I already spent a few hours on it, tried many things but I get stuck.
The result table is empty.
If seems the value of the attribute "Name" (from the first loop) is somehow lost in the subprocesses. I get either an empty table or an unfiltered table (if i invert the filters). I missed something with the "Macro" operator... but what?

Maybe can a clear mind help me here?

Thank you in advance
Alex

Here is the code so far

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="753" width="300">
     <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="30">
       <parameter key="excel_file" value="E:\Rapidminer\Blacklisting\blacklist.xls"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="Name.true.integer.attribute"/>
       </list>
     </operator>
     <operator activated="true" class="loop_examples" compatibility="5.2.008" expanded="true" height="94" name="Loop Examples" width="90" x="180" y="30">
       <parameter key="iteration_macro" value="example1"/>
       <process expanded="true" height="753" width="435">
         <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel (4)" width="90" x="45" y="30">
           <parameter key="excel_file" value="E:\Rapidminer\Blacklisting\to_filter.xls"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="A.true.integer.attribute"/>
             <parameter key="1" value="B.true.polynominal.attribute"/>
             <parameter key="2" value="C.true.polynominal.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="loop_examples" compatibility="5.2.008" expanded="true" height="76" name="Loop Examples (2)" width="90" x="179" y="30">
           <parameter key="iteration_macro" value="example2"/>
           <process expanded="true" height="753" width="435">
             <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes (4)" width="90" x="45" y="30">
               <list key="function_descriptions">
                 <parameter key="toRemove" value="if(&quot;%{example2}&quot;==&quot;%{example1}&quot;, true, false)"/>
               </list>
             </operator>
             <operator activated="true" class="filter_examples" compatibility="5.2.008" expanded="true" height="76" name="Filter Examples (5)" width="90" x="180" y="30">
               <parameter key="condition_class" value="attribute_value_filter"/>
               <parameter key="parameter_string" value="toRemove != true"/>
             </operator>
             <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes (5)" width="90" x="315" y="30">
               <parameter key="attribute_filter_type" value="single"/>
               <parameter key="attribute" value="toRemove"/>
               <parameter key="invert_selection" value="true"/>
             </operator>
             <connect from_port="example set" to_op="Generate Attributes (4)" to_port="example set input"/>
             <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Filter Examples (5)" to_port="example set input"/>
             <connect from_op="Filter Examples (5)" from_port="example set output" to_op="Select Attributes (5)" to_port="example set input"/>
             <connect from_op="Select Attributes (5)" from_port="example set output" to_port="example set"/>
             <portSpacing port="source_example set" spacing="0"/>
             <portSpacing port="sink_example set" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="set_macro" compatibility="5.2.008" expanded="true" height="76" name="Set Macro (2)" width="90" x="45" y="120">
           <parameter key="macro" value="example1"/>
           <parameter key="value" value="%{example1}"/>
         </operator>
         <connect from_port="example set" to_op="Set Macro (2)" to_port="through 1"/>
         <connect from_op="Read Excel (4)" from_port="output" to_op="Loop Examples (2)" to_port="example set"/>
         <connect from_op="Loop Examples (2)" from_port="example set" to_port="example set"/>
         <connect from_op="Set Macro (2)" from_port="through 1" to_port="output 1"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_example set" spacing="0"/>
         <portSpacing port="sink_output 1" spacing="0"/>
         <portSpacing port="sink_output 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Read Excel (2)" from_port="output" to_op="Loop Examples" to_port="example set"/>
     <connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>



Best Answer

  • scepxkoscepxko Member Posts: 15 Maven
    Solution Accepted
    Hello Marius,

    it was so quick to implement that I feel a bit ashamed :-)
    Thank you very much.

    "Set Minus" removed from the main list the elements of the blacklist.
    Finally I preferred to keep the elements of the blacklist and remove the others. The operator "Intersect" gave the result.

    For information, here is the implementation of "Set Minus" for the other Rapidminer students here:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="753" width="949">
          <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="120">
            <parameter key="excel_file" value="E:\Rapidminer\Blacklisting\blacklist.xls"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Name.true.integer.id"/>
            </list>
          </operator>
          <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="30">
            <parameter key="excel_file" value="E:\Rapidminer\Blacklisting\to_filter.xls"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="A.true.integer.id"/>
              <parameter key="1" value="B.true.polynominal.attribute"/>
              <parameter key="2" value="C.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="set_minus" compatibility="5.2.008" expanded="true" height="76" name="Set Minus" width="90" x="246" y="30"/>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Set Minus" to_port="subtrahend"/>
          <connect from_op="Read Excel (3)" from_port="output" to_op="Set Minus" to_port="example set input"/>
          <connect from_op="Set Minus" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    just load both tables, set the role of the filter columns to "id" and use the Set Minus operator. This should greatly simplify your process :)

    Best, Marius
Sign In or Register to comment.