Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Count distinct in Turbo Prep and conflict between Sum and Count Distinct in Aggregate operator
CiaoHerman
Member Posts: 1 Learner I
Hi fellows,
I got two question here. Both are about aggregate function.
I got two question here. Both are about aggregate function.
- Is there a method that I could do Count Distinct in the Pivot under Turbo Prep? I know that it can be done in the Process by selecting 'only distinct'. But I really just need to do some simple preprocessing, it's not necessary to use Process.
- Because I cannot do the step in Turbo, so I jump to Process. But when using the 'Aggregate Operator', I need to implement Sum and Count Distinct, but when I select 'only distinct', it will also Sum Distinct, I have to use two operators.
Tagged:
1
Best Answer
-
IngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,
Currently the "only distinct" option is not available in Turbo Prep (yet). I am actually somewhat surprised how long it took until somebody brought it up here. This is actually not a big change, so let me see if we can squeeze this into the next release (v9.1, coming next month). The beta phase for this has actually started today and typically that means a feature-freeze. But since this is such a small change we may be able to pull it off... stay tuned.
On the second point: yes, that is indeed a bit of a problem that the "only distinct" option is defined only globally and not also locally per aggregate function. In TP, we would actually need to the same in the background, i.e. create one aggregate operator for all the ones with "only distinct" activated and another one for all the ones where is not. This would make the change above a bit bigger though and this definitely won't make it into the next release then any longer...
Best,
Ingo7
Answers