Aggregate operator applied to each subset

SylvainMSylvainM Member Posts: 18  Maven
edited August 2020 in Help
Hello everyone,

I apologize if this question has already been asked elsewhere, or if it is an obvious one. I'm still learning how to use Rapidminer :smile:

This is my problem. Let's suppose that I have a dataset looking like that (but much more different values):

Year Region Item
01      QC      CCD
01      QC      CCD
01      QC      CS

01      ON      CCD
01      ON      CS

01      NB      CCD
01      NB      CS
02      QC      CCD
02      QC      CS
02      QC      CS

02      ON      CS
02      ON      CS

02      NB      CCD
02      NB      CCD

I would like to get the relative percentage of each Item related to the Region and to the Year

Year Region Item   Proportion
01      QC      CCD   66.6%
01      QC      CS      33.3%

01      ON      CCD   50%
01      ON      CS      50%

01      NB      CCD    50%
01      NB      CCS    50%
02      QC      CCD    33.3%
02      QC      CS       66.6%

02      ON      CS       100%

02      NB      CCD     100%

I tried many combinations with the operators Aggregate, Loop values, Branch, etc. but I seem to constantly fail... 

Do you have any suggestion?

Thanks a lot!

Best Answer


  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 709   Unicorn

    what you're trying to achieve is called "window functions" in SQL. 

    You should check out this project, it's an implementation of window functions in RapidMiner.

    You can calculate groupwise sums or counts, generate the ratios, and then aggregate according to your needs. 


  • SylvainMSylvainM Member Posts: 18  Maven
    Thanks BalazsBarany  and SGolbert for your help,

    Your solution, Sebastian, is perfect! Thank you so much! :smiley:

    Best regards to both of you,

