"[solved] Routing? branching? sorting?"

greggreg Member Posts: 23 Contributor II
edited June 2019 in Help
Hello
I'm a complete newbie to rapidminer, I'm trying to find out if I can use it for my needs.

My goal is to "route" a row, based on a column's value. For example, in a people database, lines for males go a first database, females to a second database.

How can I acheive that?

TIA

greg

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Greg,

    that's quite easy with RapidMiner, just use the Filter Examples operator. For an example, copy the text below into the XML view of your RapidMiner instance and press the green button on top (beware, that will overwrite your current process).

    Instead of the Store operators at the end you could use the Write Database operators.

    Best,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.015" expanded="true" name="Process">
        <process expanded="true" height="251" width="547">
          <operator activated="true" class="generate_nominal_data" compatibility="5.1.015" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="number_of_values" value="2"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.1.015" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="filter_examples" compatibility="5.1.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="313" y="120">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="att1 = value2"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.015" expanded="true" height="60" name="Store (2)" width="90" x="447" y="120">
            <parameter key="repository_entry" value="value2_data"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.1.015" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="att1 = value1"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.015" expanded="true" height="60" name="Store" width="90" x="447" y="30">
            <parameter key="repository_entry" value="value1_data"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Store (2)" to_port="input"/>
          <connect from_op="Store (2)" from_port="through" to_port="result 2"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • greggreg Member Posts: 23 Contributor II
    Thanks, it's exactly what I was looking for. I tried with "branch" because I though it could be done with 1 operator, in fact I was missing the "multiply" operator.
    BTW, what is the "branch" operator for?

    TIA

    greg
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    you could actually also build the process with the Branch operator - RapidMiner is about choices and possibilities :) If its condition is fulfilled, it executes the first subprocess, otherwise the second one.

    BTW, what does "TIA" mean?

    Best,

    Marius
  • greggreg Member Posts: 23 Contributor II
    Thanks for your answer.

    I'm currently evaluating Rapidminer as an alternative to writing scripts to handle list of users. Right now I find rapidminer harder to use, but of course it's because I don't know it yet. I tried talend for the same purpose some times ago and gave up on it.
    I guess as all beginners, I'm looking for starting material, tutorials, books, etc. The only tutorials I found as very specific and didn't help me much.

    They are cases where I just cannot see how to do it in rapidminer... for example, I want to remove duplicates based on a criteria, how can I find out how to do that?

    TIA means "thanks in advance" ;)

    greg
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    What do you mean by "criteria" for  removing duplicates? If you mean that 2 examples are considered as duplicates if some of their attributes match, you can use the Remove Duplicates operator and specify the attributes you want to compare as attribute filter.

    Best,
    Marius
Sign In or Register to comment.