Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Calculation by Groups

ShubhaShubha Member Posts: 139 Maven
edited November 2018 in Help
Hi,

This is a newbie's question...I have a data which looks like the below:

Group Value
1 78
1 64
1 75
2 66
2 54
2 72
2 77
3 57
3 59
3 61

Now, i want to do a calculation by groups. And this calculation is ABS("Value" - Group Average). The output datafile should look like the below:

Group Value Result
1 78 5.666666667
1 64 8.333333333
1 75 2.666666667
2 66 1.25
2 54 13.25
2 72 4.75
2 77 9.75
3 57 2
3 59 0
3 61 2


This is exactly group processing. How do we do this in Rapid Miner?

Many thanks for your help,
Shubha

Answers

  • ShubhaShubha Member Posts: 139 Maven
    Shubha wrote:

    Hi,

    This is a newbie's question...I have a data which looks like the below:

    Group Value
    1 78
    1 64
    1 75
    2 66
    2 54
    2 72
    2 77
    3 57
    3 59
    3 61

    Now, i want to do a calculation by groups. And this calculation is ABS("Value" - Group Average). The output datafile should look like the below:

    Group Value Result
    1 78 5.666666667
    1 64 8.333333333
    1 75 2.666666667
    2 66 1.25
    2 54 13.25
    2 72 4.75
    2 77 9.75
    3 57 2
    3 59 0
    3 61 2


    This is exactly group processing. How do we do this in Rapid Miner?

    Many thanks for your help,
    Shubha
    Does the "Attribute Constructor" or the "ValueSubgroupIterator" help me with this respect?
  • ShubhaShubha Member Posts: 139 Maven
    Does this take too many steps to do in Rapidminer?

    BR, Shubha
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    it's actually pretty simple:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="Generate Data" class="OperatorChain" expanded="no">
            <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
                <parameter key="target_function" value="sum"/>
                <parameter key="number_examples" value="12"/>
                <parameter key="number_of_attributes" value="2"/>
            </operator>
            <operator name="AttributeFilter" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="parameter_string" value="label"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="apply_on_special" value="true"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="attribute_name_regex" value="att1"/>
                <operator name="FrequencyDiscretization" class="FrequencyDiscretization">
                    <parameter key="number_of_bins" value="3"/>
                    <parameter key="range_name_type" value="short"/>
                </operator>
            </operator>
            <operator name="Sorting" class="Sorting">
                <parameter key="attribute_name" value="att1"/>
            </operator>
        </operator>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="att1"/>
            <operator name="ExampleFilter" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="att1 = %{loop_value}"/>
            </operator>
            <operator name="Aggregation" class="Aggregation">
                <list key="aggregation_attributes">
                  <parameter key="att2" value="average"/>
                </list>
            </operator>
            <operator name="DataMacroDefinition" class="DataMacroDefinition">
                <parameter key="macro" value="current_average"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="average(att2)"/>
                <parameter key="example_index" value="1"/>
            </operator>
            <operator name="IOConsumer" class="IOConsumer">
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="deletion_type" value="delete_one"/>
            </operator>
            <operator name="AttributeConstruction" class="AttributeConstruction">
                <list key="function_descriptions">
                  <parameter key="att2_abs_avg" value="abs(att2 - %{current_average})"/>
                </list>
            </operator>
        </operator>
        <operator name="ExampleSetMerge" class="ExampleSetMerge">
        </operator>
    </operator>

    Please note that the first chain is only used for generating data like you have described. The ValueIterator together with the aggregation and the merge do the trick.

    All the best,
    Ingo
  • ShubhaShubha Member Posts: 139 Maven
    Thank you very much Ingo...

    A question is:
    Can I do this without splitting the data into 3 data parts( according to the groups)...
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    with a different (and more complex) setup for RM 4.4 (coming this week): yes, it would also be possible without "dividing" the data. But what's wrong with it? Actually, the different data parts are only views on the same data set and therefore should be no problem at all.

    Cheers,
    Ingo
  • ShubhaShubha Member Posts: 139 Maven
    Oh, is it? They are the views and not exactly 3 different data being generated? And so, it would not consume the memory space for my large data?

    Thanks,
    Shubha
  • ShubhaShubha Member Posts: 139 Maven
    Hi,

    One last question,

    While I join daasets in the end by "ExampleSetMerge", the group order is changed. How to get back the original group order while merging? Is there a way to reverse the order number of the 3 different datasets generated?

    Thanks,
    Shubha
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Well, a little bit of digging around yourself would probably not hurt: in the process I sent to you, there already was a Sorting operator at the end of the data generation process. Use another one directly after the merge.

    Cheers,
    Ingo
  • ShubhaShubha Member Posts: 139 Maven
    Thanks Ingo... Sorting is fine... But actually my problem is,

    We have the generated split datasets. While joining itself, can we join in the order? Or atleast in the descending order? This would help another problem of mine...


    Thanks, Shubha
Sign In or Register to comment.