"Filter data (file.xls) as a function of five parameters (attributes)"

MattMatt Member Posts: 2 Contributor I
edited May 2019 in Help

Is there any possibilty in RapidMiner5 to filter data (file.xls) as a function of five parameters (attributes)? I am looking for the number of machines being equipped with both item 1 and 2 having respectively the length X and Y.

attribute 1: item 1
attribute 2: feature of item 1 (e.g. length X)
attribute 3: item 2
attribute 4: feature of item 2 (e.g. length Y)
attribute 5: serial number of the machine

Result: number of machines responding to the same criteria ( item 1 length X + item 2 length Y).

I got a whole bunch of items and the user should be able to choose whatever two items (having different specifications) and get as result the number of machines equipped with these two items.

Do I use Generate Attribute or Loop or whatever?

Is RapidMiner the right tool for this query?
I am newbie.
Would be glad about any return.


  • Options
    MattMatt Member Posts: 2 Contributor I

    Has no one gotten the slightest idea?

    Feel free to answer in German!
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    The filter examples operator might do what you want although it seems to be limited to logical ANDs with two conditions.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="161" width="346">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          <operator activated="true" class="filter_examples" compatibility="5.1.006" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="75">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="a1&gt;7&amp;&amp;a2&gt;3"/>
          <connect from_op="Retrieve" from_port="output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

  • Options
    SkirzynskiSkirzynski Member Posts: 164 Maven

    i don't know exactly what you want to do, but filtering data is possible of course.

    At first you have to load your data into RapidMiner with the 'Read Excel' operator for example. Now you can filter your data with the Filter Example operator. If you choose 'attribute_value_filter' as the condition class you can type "att1 > 20", so that every example will be filtered out if its attribute 'att1' is bigger than 20. Use the operator 'Generate Attribute' to create a new attribute which is the sum of two attributes.

    After that you can aggregate the filtered examples with the Aggregate operator which works similar to SQL-aggregation. So to get the number of examples you have to use the aggregation function "count".

    I hope this could help you. Otherwise try to post a minimal example dataset and the desired result.

Sign In or Register to comment.