Filtering rows and columns

Dr_Van_NostrandDr_Van_Nostrand Member Posts: 1 Contributor I
edited November 2018 in Help
Hi,

Maybe this belongs under a different thread but anyway: is there an operator for selectively filtering out (i.e. remove) one or more rows or columns from a table with data? This is possible in Knime, but Knime lacks other features that RM has.

Feedback is much appreciated!
sgenzerrm1971

Best Answer

Answers

  • qwertzqwertz Member Posts: 130  Maven

    Hi Marius,

    At first I want to say THANK YOU at this time for your fast and competent help with all my issues the last days!
    I'm really trying to find answers in this forum and the documentation/wiki first.


    I checked the "filter examples" operator you mentioned. Especially "attribute_value_filter" shows manifold capabilities to remove rows.

    Though, I still found no solution to remove all rows but the last one with this operator.
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    You can count the examples into a macro and used that as your range start and end, like this.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="349" width="681">
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="9" y="6"/>
          <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
            <parameter key="macro" value="exs"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="331" y="23">
            <parameter key="first_example" value="%{exs}"/>
            <parameter key="last_example" value="%{exs}"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    HTH
  • qwertzqwertz Member Posts: 130  Maven

    Hi haddock,

    Thanks for sharing. Indeed this solution works quite effective.

    To be a somewhat more flexible in my process I tried another setup which can provide the last x rows of an example set by doing calculation with the macro's value. Unfortunatelly this doesn't work in the "filter example" operator (e.g. "%{exs}-5" as "first example" value).

    Oh dear... I thougt it was so simple - now it drives me nuts  :(


    Sincerely
    Sachs
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    It's quite simple, I'll leave it to you, think loop counters.
  • qwertzqwertz Member Posts: 130  Maven
    Sorry, but I still don't get it. Tried the following process but without success... and still not even a clue how to make it  :'(

    This process delivers x times the (same) last row and provides a collection instead of a single example set...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="450" width="647">
         <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
         <operator activated="true" class="loop" compatibility="5.2.003" expanded="true" height="76" name="Loop" width="90" x="179" y="30">
           <parameter key="set_iteration_macro" value="true"/>
           <parameter key="iterations" value="5"/>
           <process expanded="true" height="450" width="647">
             <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
               <parameter key="macro" value="%{abc}"/>
             </operator>
             <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="179" y="30">
               <parameter key="first_example" value="%{abc}"/>
               <parameter key="last_example" value="%{abc}"/>
             </operator>
             <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
             <connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
             <connect from_op="Filter Example Range" from_port="example set output" to_port="output 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
             <portSpacing port="sink_output 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Generate Data" from_port="output" to_op="Loop" to_port="input 1"/>
         <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>

    Bye for now
    Sachs
  • haddockhaddock Member Posts: 849  Guru
    Hi,

    I should have been more explicit about the count. The last example we know; if you take five from the number of examples
    you'll filter to six examples ( think about the loop count ), so you need to add one back, like this.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="450" width="748">
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
            <parameter key="macro" value="%{last}"/>
          </operator>
          <operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="first" value="%{last}-5+1"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="648" y="30">
            <parameter key="first_example" value="%{first}"/>
            <parameter key="last_example" value="%{last}"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
          <connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    The more you use RM, the more those pesky macros show up!
  • qwertzqwertz Member Posts: 130  Maven

    Wow, that's it! Indeed simple - if you once know how it works  ;D

    Thank you so much!  ;)
  • haddockhaddock Member Posts: 849  Guru
    Indeedy!

    The biggest mistake that I see being made on this Forum is that people underestimate RapidMiner, it is a deep and mature environment with many man years behind it. And these are seriously qualified enthusiastic man years, so the chances are that most likely tasks are covered, as you say if only you know how. It's a big toolkit, and it takes time.

    Stick with it!
  • marasmamarasma Member Posts: 2 Contributor I
    Hi there,
    I've got a similar problem and after two days of work, no solutions...
    I need to remove the attributes whose sum (on its column) is < 10. How can I do? I only have numeric values under these attributes...
    every feedback is very welcome! :-[
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    edited January 16
    Hi marasma,

    I would not necessarily call this a similar problem, but try the process below. It calculates the sum of each column using Aggregate with a default aggregation, then cleans the filenames to match the original name, then selects attribute which some up to at least 0 in this example, and finally the filtered data is used as reference data to filter the original data in Reorder Attributes.

    I propose to use breakpoints after each operator to understand what's going on.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>threshold</key>
            <value>0</value>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="6.0.008" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30"/>
          <operator activated="true" class="aggregate" compatibility="6.0.008" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
            <parameter key="use_default_aggregation" value="true"/>
            <parameter key="default_aggregation_function" value="sum"/>
            <list key="aggregation_attributes"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="6.0.008" expanded="true" height="76" name="Rename by Replacing" width="90" x="380" y="30">
            <parameter key="replace_what" value=".*\((.*)\)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.0.008" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="30">
            <parameter key="attribute_filter_type" value="numeric_value_filter"/>
            <parameter key="numeric_condition" value="&gt; %{threshold}"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="6.0.008" expanded="true" height="76" name="Reorder Attributes" width="90" x="648" y="165">
            <parameter key="sort_mode" value="reference data"/>
            <parameter key="handle_unmatched" value="remove"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="original" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Reorder Attributes" to_port="reference_data"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Here is a link to this process in the Community Repository.
Sign In or Register to comment.