Options

newbie: Group by operator

QingqiuQingqiu Member Posts: 8 Contributor II
edited November 2018 in Help
hi,
I have a example set with an attribute labeling the examples into different bins (eg: 1,2,3,.., 10) and now I want to divide my dataset into 10 subsets according to the bin index. I try to use the Groupby operator but the result example set is the same as the original. I also tried to use the splittedexmapleset function but still got the same result. Anything suggestions? Thank you for any help! :)

Best Regards

Answers

  • Options
    colocolo Member Posts: 236 Maven
    Hi Qingqiu,

    maybe this is not the best way to solve your problem, but it's a simple one. You could use a "Multiply" operator combined with "Filter Examples" operators to get specific subsets. Here a small example:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="386" width="480">
          <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="20"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="greatest_att" value="if(att2 &gt; att1, 2, 1)"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply" width="90" x="179" y="210"/>
          <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples (2)" width="90" x="313" y="300">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="greatest_att = 2"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="210">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="greatest_att = 1"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_port="result 2"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="180"/>
          <portSpacing port="sink_result 2" spacing="72"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    This way all the subsets have to be set in in the process. If you have a larger number of subsets you could perhaps create the groups automatically inside a loop.

    Regards,
    Matthias
  • Options
    QingqiuQingqiu Member Posts: 8 Contributor II
    Hi Matthias,
    Thank you so much for your help!:) It works and it is really simple. I focused too much on the Groupby operator and even do not know there is a loop value operator...Thanks again! ;)

    Best Regards
    Qingqiu
Sign In or Register to comment.