RapidMiner

Aggregate based on values in two tables

Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Highlighted
Learner III kcallan
Learner III

Aggregate based on values in two tables

I have been struggling with what I feel should be a fairly simple problem and am going around in circles at this point. I have some data relating to application processing times. The main table contains application start and completion dates, and I am trying to get a count of the number of applications which are active (ie. started but not yet completed) at the end of a given quarter. Missing end dates indicated that the application is not yet complete. I have an aggregated table which contains quarter end dates and some other data which has been aggregated from the main table. I want to add the count of active applications for each quarter end date to this table. See sample data below. Help would be appreciated!

 

Quarter end dates.PNGQuarter end dates (in aggregated table)Start and End Dates.PNGApplication start and end dates (in main table)

1 REPLY
Unicorn
Unicorn

Re: Aggregate based on values in two tables

Hi @kcallan

 

I would probably approach the problem that way: 

 

  1. In each table, generate a new attribute 'quarter' to encode all the quarters in a unique way, like 2012_3, 2012_4 etc
  2. In the second table, generate numerical attribute 'finished' with two values 0 / 1, 0 being unfinished applications and 1 finished ones
  3. Aggregate second table with grouping by 'quarter' and with aggregation function sum of the above 'finised' attribute (so for each quarter you get sum of all 1's)
  4. Join tables on 'quarter' variable