It looks like you're new here. Sign in or register to get started.
how do we delete threads?
i gave up on aggregate by group and moved onto LOOP VALUES -> AGGREGATE instead cuz.... that's all i could 'make work'.
who knows if its actually working tho...
exactly for this reason the Operator "Group into Collection" in the extension 'Operator Toolbox' exists.
The result is a collection of ExampleSets grouped by one Attribute.
Would it be possible to share the process XML code here so that we can step through the process and see what is the error?
I am not sure if I understand what you are trying to do, but don't you just need the Remove Duplicates operator keeping one record of every URL?
In your example, you are taking the mode of 100% missing values. The attribute has the metadata about all the possible values and it finds that all of those potential values appear zero times. There is no clear winner so it will just pick one of the values as the mode. You could argue that it should keep it missing instead. Did I understand correctly that this would be the expected behaviour from your perspective?
Also, the Aggregate operator has a parameter called "ignore missings" that is set to true by default. If you set it to false then do you get the result that you expect?
Thanks for the explanation, I think now I get what you are trying to do. I was not sceptical, just did not understand fully.
Believe me that it does not pull the data from thin air. Even if you filter and save a dataset, each nominal attribute remembers all the potential values it ever had. This is quite useful in many cases so we do not intend to change that.
However, when you calculate mode on an group that only has missing values, then mode is counting the occurances of all potential values. All of them have zero occurances, so it is doing what it needs to do in case of a draw: picks one. This is a bug, and we need to make sure that if all values have zero occurances then it picks missing ("?") as a result. I have filed this in our internal bug tracker and it will be fixed in one of the upcoming releases.
Thanks for bringing this up!
Until the bug is fixed perhaps the Operator "Materialize Data" can help.
If you have filtered a dataset and are sure that you do not want to keep the potential values you can use this Operator right after your filtering steps / before your aggregations. It basically recreates the Metadata on the available data.
Hi @zprekopcsak @Edin_Klapic - if this is a recognized bug, can I move this thread to "Product Feedback" so that Balazs H. can manage?
RapidMiner AI Hub
Automated Data Science
Training Classes & Certification
ML Algorithm Reference
Educational License Program