Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Remove Duplicates
Hi guys, I am new to RapidMiner, so please bear with me.
I am trying to use Turbo Prep to do some data cleansing before analysing because the dataset has many issues.
I will select name/ID and use remove duplicates function.
In this case:
ID: Number of buying:
AA01 8
AA01 10
It seems like RapidMiner will only keep the first row of 'Num of buying'.
Is there any way I can keep the average or sum or max or min of the column of 'Number of Buying'?
I am trying to use Turbo Prep to do some data cleansing before analysing because the dataset has many issues.
I will select name/ID and use remove duplicates function.
In this case:
ID: Number of buying:
AA01 8
AA01 10
It seems like RapidMiner will only keep the first row of 'Num of buying'.
Is there any way I can keep the average or sum or max or min of the column of 'Number of Buying'?
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Of course this is possible. However, this is called grouping or aggregation. You find it in Turbo Prep under "Pivot".
Regards,
Balázs5
Answers