Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

newbie: Group by operator

QingqiuQingqiu Member Posts: 8 Contributor II
edited November 2018 in Help
hi,
I have a example set with an attribute labeling the examples into different bins (eg: 1,2,3,.., 10) and now I want to divide my dataset into 10 subsets according to the bin index. I try to use the Groupby operator but the result example set is the same as the original. I also tried to use the splittedexmapleset function but still got the same result. Anything suggestions? Thank you for any help! :)

Best Regards

Answers

  • colocolo Member Posts: 236 Maven
    Hi Qingqiu,

    maybe this is not the best way to solve your problem, but it's a simple one. You could use a "Multiply" operator combined with "Filter Examples" operators to get specific subsets. Here a small example:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="386" width="480">
          <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="20"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="greatest_att" value="if(att2 &gt; att1, 2, 1)"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply" width="90" x="179" y="210"/>
          <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples (2)" width="90" x="313" y="300">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="greatest_att = 2"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="210">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="greatest_att = 1"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_port="result 2"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="180"/>
          <portSpacing port="sink_result 2" spacing="72"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    This way all the subsets have to be set in in the process. If you have a larger number of subsets you could perhaps create the groups automatically inside a loop.

    Regards,
    Matthias
  • QingqiuQingqiu Member Posts: 8 Contributor II
    Hi Matthias,
    Thank you so much for your help!:) It works and it is really simple. I focused too much on the Groupby operator and even do not know there is a loop value operator...Thanks again! ;)

    Best Regards
    Qingqiu
Sign In or Register to comment.