Filter "Top K" samples

mataiomataio Member Posts: 6 Contributor I
Hello everybody,

I have a question regarding the filtering of samples. I would like to filter my samples like the Top 10% of attribute X. I know it is possible to use the "Filter Examples" operator but as far as I know it can only use a static value as filter like X>=1.

Does anybody know a way to tackle my problem?

Thanks in advance 

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,022  RM Data Scientist
    Hi there,

    you can use a combination of sort, generate ID and a Filter examples to extract the top k in attribute X. If you want to have the top k % you simply need to provide the sample Size or extract it using aggregate and extract macro

    Attached is a example process to select the top 3 values of att1 in the iris dataset
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="120">
           <parameter key="repository_entry" value="//Samples/data/Iris"/>
         </operator>
         <operator activated="true" class="sort" compatibility="6.1.000" expanded="true" height="76" name="Sort" width="90" x="246" y="120">
           <parameter key="attribute_name" value="a1"/>
         </operator>
         <operator activated="true" class="generate_id" compatibility="6.1.000" expanded="true" height="76" name="Generate ID" width="90" x="380" y="120"/>
         <operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="581" y="120">
           <list key="filters_list">
             <parameter key="filters_entry_key" value="id.lt.4"/>
           </list>
         </operator>
         <connect from_op="Retrieve Iris" from_port="output" to_op="Sort" to_port="example set input"/>
         <connect from_op="Sort" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.