Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Aggregate and remove attributes"
Hi,
I have a data set containing 2000 columns or attributes having values as real number. What i want to do is aggregate (sum) three attributes and create a new attribute with the summing result and remove the last two leaving first attribute and the resulting attribute. I can do that by using operators generate attribute (sum) and remove range. But i want to impose this procedure to the whole 2000 attributes, that means three attributes will be summed leaving two attributes (including resulting one) and then next three will be summed leaving two attributes and so on. Is there any procedure that i can impose that generate attribute and remove range to the whole data set automatically or is there any other procedure to do it? please let me know if you need more specifications.
Jony
I have a data set containing 2000 columns or attributes having values as real number. What i want to do is aggregate (sum) three attributes and create a new attribute with the summing result and remove the last two leaving first attribute and the resulting attribute. I can do that by using operators generate attribute (sum) and remove range. But i want to impose this procedure to the whole 2000 attributes, that means three attributes will be summed leaving two attributes (including resulting one) and then next three will be summed leaving two attributes and so on. Is there any procedure that i can impose that generate attribute and remove range to the whole data set automatically or is there any other procedure to do it? please let me know if you need more specifications.
Jony
Tagged:
0
Answers
if your attributes follow a certain naming scheme (like att_1 ... att_2000) you could propably use the Loop Operator with the iteration macro set. This macro you could use to select which attributes you want to sum up.
I hope this gives you an idea how to proceed with your problem, if not feel free to ask.
Best regards,
David
Thanks for your reply. I knew loop operators needs to be used but the problem is i am not an expert in macro.
Every time i use a loop it just give me the same result 1000 times if iteration 1000 is used. What i want to do to get the result in a single table which includes summation of every three attributes column and only the first column of those three and goes on..
if the inputs are 1,2,3,4.. i want the output as 1,(1+2+3),4,(4+5+6)...
renaming generated attributes (which are sum of three) is also a problem in loop. the naming needs to be variable so that it changes in every iteration.
do you think it is possible for you to make such a process and share it with me?then i can understand properly.
regards
Jony
The result of this is a collection of the desired aggregations. What you need to do next is extract these values and simple add them to your dataset as it fits your requirements.
I have 132 attributes so i put the iteration number as 44. now i have 44 results with the last one as a_sum_130_131_132, this is fine. but the problem is all the aggregation results are same, i mean from the first aggregation 1_2_3 to last 130_131_132 i am adding only first three. so 1_2_3 result is right but all other are wrong as because they all are same (i have selected the parameter attributes in the generate attribute operator the first three). can you please tell me how to solve it, and i actually wanted to rename the generated attribute as sum_time 1_time 2_time 3 as my attributes names are time 1, time 2, time 3.. can you please tell me how can i do that?
thanks
Jony
In my example process I have used the names i_1, i_2 and i_3 which are set in each iteration to the number you want to aggregate like {1,2,3}, {4,5,6} and so on. So these macros are used in theattributes selection of Generate Aggregation Operator.
The entries there have to look like this:
att%{i_1}, att%{i_2} and att%{i_3} , so for each iteration the values of the macros are evaluated and the correct attributes can be selected.
The attribute name is simple defined in the filed named "attribute name" of the Generate Aggregation Operator. Here again the actual values of the macros are pasted for each iteration.
I hope this clarifies everything for you.
Best Regards,
David
I understand your point, but i still getting trouble solving my problem.
When i use att%{i_1}, att%{i_2} and att%{i_3} in my generate aggregation operator parameter, it does not give me any result. The attributes got generated as att1, att2, (in parameter of the operator, not in the result) and so on, which are not available in my data. i firstly thought that by att1 it automatically selects the first attribute but it seems like it does not. btw, my first attribute is ID which is supposed not to be aggregated, i want my aggregation to start from the second attribute. i have also used iteration names as i_1. i_2. i_3 like yours and the function as well.
I can send you my data, but i cant find any attachment procedure.
regards
jony
regards
Jony
In my example the attributes are named the default way, which is att1,att2,... and for your data it is time_1, time_2, ...
so you have to use these names in the selection during your aggregation. So instead selecting att%{i_1} it should read time_%{i_1}.
RapidMiner is the refering to the actual attribute names and not some meta data. As a consequence of this you should not be worried about the ID attribute, becasue it has a different name.
regards,
David
i_1 is (%{iteration}*3)*2
i_2 is (%{iteration}*3+1)*2 and
i_3 is (%{iteration}*3+2)*2
so you get all even numbers in blocks of 3 starting with 0.
The case of the first attribute named 12-31-22 I would just handle separetely.
Like after Time 01-01-22 i have Time 01-02-00, Time 01-02-02 and after Time 01-02-22 there is Time 01-03-00 etc. how to handle those?
Then David's suggested process should work.
(A word of caution, make sure that all of your columns are in the order you want them to begin with, otherwise when they get replaced by Generic Names it won't be clear which column is which. )
Hope that helps.
Is that what you meant?
This helps, but the thing is after aggregating with the new names i get aggregated columns as sum_1_2_3 for example, but i want in the output column names as sum_time-12-31-22_time-01-01-00_time-01-01-04 and so on. i mean in the result i also want my original names. but if i do replace the names and do the aggregation then it gives result with new names (obviously). but hot to replace those with original names?