"[SOLVED] Filter data from examples set"

tomkowskitomkowski Member Posts: 3 Contributor I
edited June 2019 in Help
Hi,

I'm beginner in the RapidMiner, so in my first step I try to extract some data from Access database, do some operations and display it for the end.

I'm stopped at the point how to select some data from the data set.

What I do: make repository with data from MS Access, Select attributes - two columns A and B with text, next Generate Attributes - column C where are joined strings from A and B. All columns contains words (text). For example, column A: "Gurund", column B: "Corporation" and column C: "Gurund Corporation". Of course, at column B value are not only "Corporation". There are many different values also.

Next I would like to filter rows where can find word "Corporation" only and display it. I try different Operators like Filter Documents or Filter Examples,, but I not found anyone which help me. Can you write any suggestion?
Tagged:

Answers

  • fritmorefritmore Member Posts: 90 Contributor II
    try operator Filter examples
    condition class:  Attribute value filer
    parameter string: B="Corporation"
  • tomkowskitomkowski Member Posts: 3 Contributor I
    Thank you for your answer.

    I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Heya,

    a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.

    Please have a look at the attached process.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
        <process expanded="true" height="116" width="681">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
            <parameter key="replace_what" value="value0"/>
            <parameter key="replace_by" value="Car Truck Moto"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="indicator" value="matches(att1, &quot;.*Truck.*&quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="indicator=true"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • fritmorefritmore Member Posts: 90 Contributor II
    tomkowski wrote:

    Thank you for your answer.

    I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.
    hi t
    I think I used some filtering with reg exp before to filter examples  CONTAINING a word.
    here are RM regular expressions

    http://rapid-i.com/wiki/index.php?title=Regular_expressions

    I am not sure if the reg exp work in filter examples attribute_value_filter, try.
    If not they definitely work in Generate attrib as marius suggested.
    good luck

  • tomkowskitomkowski Member Posts: 3 Contributor I
    Hi All,

    Thanks Marius for your suggestion. I try and play with the Generate Attributes operator and I received desired result. 
  • zahrahnnxzahrahnnx Member Posts: 9 Contributor II
    Marius wrote:

    Heya,

    a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.

    Please have a look at the attached process.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
        <process expanded="true" height="116" width="681">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
            <parameter key="replace_what" value="value0"/>
            <parameter key="replace_by" value="Car Truck Moto"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="indicator" value="matches(att1, &quot;.*Truck.*&quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="indicator=true"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    It shows all rows which contain ".... Truck....", what if we want to check two words come together ? For example "Truck" and "car" come together or with 1~4 words in between. Eg: "...truck ,(some words), car... " 

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    easily done with the Filter Examples operator in Studio 6.3, you just specify the words you want, then at the bottom if they must ALL be included or if ANY occurrence is sufficient.

    image

    Regards,
    Marco
Sign In or Register to comment.