Merging examples

RyujakkRyujakk Member Posts: 17 Maven
edited November 2018 in Help
Hello,

My example set is similar to the one generated by this process:

<operator name="Root" class="Process" expanded="yes">
    <operator name="OperatorChain" class="OperatorChain" expanded="no">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
        </operator>
        <operator name="label is regular" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
        <operator name="BinDiscretization - 50" class="BinDiscretization">
            <parameter key="number_of_bins" value="50"/>
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="label is label" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
            <parameter key="target_role" value="label"/>
        </operator>
        <operator name="Nominal2Numerical" class="Nominal2Numerical">
        </operator>
        <operator name="BinDiscretization - 2" class="BinDiscretization">
            <parameter key="range_name_type" value="short"/>
        </operator>
        <operator name="Nominal2Numerical (2)" class="Nominal2Numerical">
        </operator>
        <operator name="Sorting" class="Sorting">
            <parameter key="attribute_name" value="label"/>
        </operator>
    </operator>
</operator>
Basically, I have something like this:

label att1 att2 att3 att4 att5
range1 1.0 0.0 0.0 0.0 1.0
range1 0.0 0.0 1.0 1.0 0.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
range11 1.0 0.0 0.0 1.0 1.0
....
I would like to merge all the "rangeX" examples, so that for each attribute, the maximum across all examples with the same ID is kept. eg, I want:

label att1 att2 att3 att4 att5
range1 1.0 0.0 1.0 1.0 1.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
....
I hope I'm clear here... Unfortunately, I don't have access to the data format, so I must do this crazy trick. I guess I could always write my own operator to do this, but I'm sure RapidMiner has all the necessary operators already available for this!

Thanks for any pointers  :)

- R

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    thank you for this excellent post. Solving a problem described in a such detailed manner is fun. So I will not only point you to the Aggregation operator, but I also have an example process for you:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="OperatorChain" class="OperatorChain" expanded="no">
            <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
                <parameter key="target_function" value="random"/>
            </operator>
            <operator name="label is regular" class="ChangeAttributeRole">
                <parameter key="name" value="label"/>
            </operator>
            <operator name="BinDiscretization - 50" class="BinDiscretization">
                <parameter key="number_of_bins" value="50"/>
                <parameter key="range_name_type" value="short"/>
            </operator>
            <operator name="label is label" class="ChangeAttributeRole">
                <parameter key="name" value="label"/>
                <parameter key="target_role" value="label"/>
            </operator>
            <operator name="Nominal2Numerical" class="Nominal2Numerical">
            </operator>
            <operator name="BinDiscretization - 2" class="BinDiscretization">
                <parameter key="range_name_type" value="short"/>
            </operator>
            <operator name="Nominal2Numerical (2)" class="Nominal2Numerical">
            </operator>
            <operator name="Sorting" class="Sorting">
                <parameter key="attribute_name" value="label"/>
            </operator>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <parameter key="keep_example_set" value="false"/>
            <list key="aggregation_attributes">
              <parameter key="att1" value="maximum"/>
              <parameter key="att2" value="maximum"/>
              <parameter key="att3" value="maximum"/>
              <parameter key="att4" value="maximum"/>
              <parameter key="att5" value="maximum"/>
            </list>
            <parameter key="group_by_attributes" value="label"/>
        </operator>
    </operator>
    Greetings,
      Sebastian
  • andkuo_7andkuo_7 Member Posts: 3 Contributor I

    Many years later and I had a similar (if not exactly the same) problem as OP.

     

    Found the solution on this post

Sign In or Register to comment.