Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Aggregate Operator creates empty example set with Group-By date"

chrismxnrchrismxnr Member Posts: 12 Contributor II
edited June 2019 in Help
Hello,
I have a problem with the group-by operator. I have an exampleset, where I want to aggregate some per-day information. So I created an attribute which contains the date of every example, added the Aggregate operator and used the date as group-by attribute. This always gives me an empty exampleset as result. In some tests I found that I always get an empty result if there is a group-by attribute of type date.

At the moment I convert these dates to nominal before and back after aggregation as workaround.

I also posted this in the bugtracker some time ago, but because it didn't get fixed nor anyone commented, I wanted to confirm that this is not just a problem of my configuration? At least I guess it's not the desired behaviour?

Best regards,
Chris
Tagged:

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Chris,

    yes, grouping in the aggregation operator on date attributes does not directly work. You have to transform the date into a nominal attribute first. Of course this seems to be a bit strange, but actually this helps since you can define the desired granularity of the aggregation easily by using the operators Date to Nominal or Date to Numerical first. Otherwise it would not be clear if you would like to aggregate a date attribute by day or by month or by year, or if you would like to aggregate a date_time by hour, by day, by...

    Certainly, we could (and probably will) extend the aggregation operator in a way the user can select the desired granularity if a data(_time) attribute is selected for grouping. Until then, this "workaround" allows you to use any granularity you like.

    Hope this explanation has helped a bit to clarify this issue.

    Cheers,
    Ingo
  • chrismxnrchrismxnr Member Posts: 12 Contributor II
    Hi,
    thanks for the fast reply. Yes, I think it would be a good option :). But if not it would be maybe good to throw a warning or something like that (at least a hint in the help-text), because I spent some time to figure out why my exampleset was always empty after aggregation.

    Best regards,
    Chris
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Chris,

    after talking about this issue here at Rapid-I we decided to change the Aggregation operator to also allow grouping by date in one of the next releases, but for now without granularity settings. That means that dates are considered equal if and only iff they are identical up to the millisecond. Until we extend the operator with granularity settings, those must be simulated by the user with some preprocessing steps before applying the Aggregate operator.

    Best,
    Marius
Sign In or Register to comment.