Options

"Filter/select specific rows from set"

MRonMRon Member Posts: 3 Contributor I
edited May 2019 in Help
Hello!

I select 500 rows from DB. This set is simple - has two "columns"(attributes?). Both are text fields, first one has label role and it consists name of the car brands).
My question is: how to remove/filter/delete these rows which appear in my set less than 10 times?
I would like to achive this in RapidMiner directly, not on DB level.

Cheers!
Tagged:

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,995 RM Engineering
    Hi,

    unfortunately, this is currently not as easy as we would like..
    However, it is possible ;)
    I don't have your data, so I made an example process to illustrate how it can be done:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="346" width="949">
          <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="165">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.008" expanded="true" height="76" name="Set Role (2)" width="90" x="447" y="165">
            <parameter key="name" value="Outlook"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.1.008" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
            <list key="aggregation_attributes">
              <parameter key="Outlook" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="|Outlook"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.1.008" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="count(Outlook)&gt;4"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
            <parameter key="name" value="Outlook"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="join" compatibility="5.1.008" expanded="true" height="76" name="Join" width="90" x="648" y="120"/>
          <connect from_op="Retrieve" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Note that you will need to adapt the process to your specific settings (changing alot of parameters), but that shouldn't be too hard.

    Regards,
    Marco
  • Options
    MRonMRon Member Posts: 3 Contributor I
    Thank you for your answer!
    Marco Boeck wrote:

    unfortunately, this is currently not as easy as we would like..
    This is no problem for me :). I just thought that I missed operator which does this.
    Marco Boeck wrote:

    I don't have your data, so I made an example process to illustrate how it can be done:
    Could you tell me what is Select Attributes in you example for? At first glance I've received the same results without it.

    Cheers!
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,995 RM Engineering
    Hi,

    you're right, that operator is not needed. I forgot to remove it ;)

    Regards,
    Marco
Sign In or Register to comment.