How to automatically aggregate the numerial value of every 10/20/30... coloums

cindyliu_aucindyliu_au Member Posts: 6 Newbie
the original data: 300 attributes: from day1 to day300

I need create 3 datasets with generating features (each row is still each student (id))

dataset1: feature generation: aggregate every 10 days, resulting in 30 attributes (day1-10, day11-20...)
dataset2: feature generation: aggregate every 20 days, resulting in 15 attributes (day1-20, day21-40...)
dataset3: feature generation: aggregate every 30 days, resulting in 10 attributes (day1-30, day31-60...)

I know I can use generate attribute operator then manually select day1 to day10, then day11 to day20...
but I want to know how to automatically generate these aggregated features?

Thank you!

Best Answer

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted

    Here's an automatic solution. It transposes a copy of the data, so you have day1-dayN in rows. Then it processes these in batches using Loop Batches. You just enter the number of elements in a batch in the batch size parameter. I tested with different values, it works with every setting >= 2.

    Inside the batch, the process generates a macro for selecting the dayX attributes, generates a name like day1-dayN and executes Generate Aggregation with this regular expression based attribute filter. 



  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @cindyliu_au,

    I know it is not an optimal method, but you can use Generate Aggregation operator and select subset.
    In attached file, you can find a process...

    Hope this helps,


  • Options
    cindyliu_aucindyliu_au Member Posts: 6 Newbie

    the way you provided is a manual method, which I have already achieved.

    I am wondering the automatic way becasue I could have 4800 attributes later on, and I would try every 10/20/30/40 days, as well as every 7/14/21/28/35 days. That would be a great workload if I do it manually...

    but, still thank you for your help anyway!

    I'm waiting for someone could give me some clues of the automatic ways.

  • Options
    cindyliu_aucindyliu_au Member Posts: 6 Newbie
    edited December 2021
    Hi BalazsBarany,

    This is awesome solution!!! it works very well!!!

    Thank you so much!!!!

    Btw, in your solution, you use the operator "recall" and the operator "remember changes", looks very interesting!  I'll have to learn what are they and how they work  :D

    Thanks again BalazsBarany

Sign In or Register to comment.