"Filtering based on attribute values"

frankiefrankie Member Posts: 26 Contributor II
edited May 2019 in Help

while this might be a simple question, I have to ask, what is the best way to filter a dataset based on a subset of attribute values?
For example: let's say that I have a dataset with 3 attributes

attr1    range: [1,10]
attr2    range: [1,20]
attr3    range: [1,30]

and that I want to filter out those examples that have either
[tt]attr1 > 9 OR attr2 > 18 OR attr3 < 5[/tt].

Can these "outliers" be filtered with one operator? How?


  • Options
    haddockhaddock Member Posts: 849 Maven
    Er,... you can use 'Filter Examples', here is some stuff from the help !!!!!
    Please note your can define a logical OR of several conditions with || and a logical AND of two conditions with two ampersand (condition1 && condition2) - or simply by applying several ExampleFilter operators in a row. Please note also that for nominal attributes you can define a regular expression for value of the possible equal and not equal checks.
    To filter all examples (i.e. rows) where an attribute "att" has a missing value use the expression "att = ?" resp. "att!= ?". Note that for nominal values the question mark must be escaped ("\?") because, as noted above, a regular expression is expected in this case.
    For "unknown_attributes" the parameter string must be empty. This filter removes all examples containing attributes that have missing or illegal values. For "unknown_label" the parameter string must also be empty. This filter removes all examples with an unknown label value.
    and here is an example....
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.003">
     <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
       <process expanded="true" height="370" width="346">
         <operator activated="true" class="generate_data" compatibility="5.1.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="165"/>
         <operator activated="true" class="filter_examples" compatibility="5.1.003" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="210">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="att1&lt;0 || att2 &gt; 0"/>
           <parameter key="invert_filter" value="true"/>
         <connect from_op="Generate Data" from_port="output" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    frankiefrankie Member Posts: 26 Contributor II
    Thanks and sorry for not reading the entire help-file. Just though the "Filter Examples" operator looked too simple with only one input field.. hence I disregarded it... :)
  • Options
    haddockhaddock Member Posts: 849 Maven
    Easily done, and it is not the world's raciest read  ;D

    Good weekend!
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    well if anybody finds a better description of what it does: Just edit it on the wiki...I'm very open to all literary valuable phrases that still help to understand what an operator does :)

Sign In or Register to comment.