Options

# Daily hour sums on multi-day spans of time

Member Posts: 2 Contributor I
edited November 2018 in Help
I have an example set from a spreadsheet that has start and end times which can span multiple days, and I need the total number of hours per day that this time spanned.

For instance, if I have one record with a start time of May 15, 2014 11:00 and an end time of May 15, 2014 14:00, the result should be:
May 15, 2014 3

The more difficult case is when I have one record with a start time of May 15, 2014 09:00 and an end time of May 18, 2014 22:00, I need the following result:
May 15, 2014 16
May 16, 2014 24
May 17, 2014 24
May 18, 2014 22

Then I need to sum the totals per day. So for an example set containing the two records mentioned above, the result would be:
May 15, 2014 19
May 16, 2014 24
May 17, 2014 24
May 18, 2014 22

The approach I am trying is to create extra examples for each of the in-between days (May 16 and 17 in the above example) and assign them 24 hours. But I am having trouble with this. I know how to loop through the example set and create a single new example for each day individually, but I don't know how to then connect ALL of those new examples to the original example set (my first attempt UNIONed only the last one from the loop to the set).

Can anyone help with how to do this, or a better way to solve the overall problem?

Thanks,
Kelly

• Options
RapidMiner Certified Expert, Member Posts: 458 Unicorn
Hello Kelly

There is an operator called Fill Data Gaps that can fill in missing values in sequences. You might need some gymnastics to make it work.

After that, the hour will be missing for the newly inserted rows so you would use Replace Missing Values to change missing to 24.

Then you would use the Aggregate operator to sum the hours for each day.

regards

Andrew
• Options
Member Posts: 2 Contributor I
Thanks for your response. The Fill Data Gaps operator looked interesting, but I eventually gave up trying to make this work off of a spreadsheet alone.

Although this is not a real solution to my original question, it might help someone else to know that what ended up working was doing everything via MySQL. I created a table with a single column storing a datetime for each day from 2005 through 2050 (each day starting at 00:00). Then in a single Read Database, I queried the equivalent data that I had in the spreadsheet from MySQL and joined it to the days table on day between start time and end time. This gave me all the in between days. I had to do some other "gymnastics" to get everything else working, but it does work now.

My query was like this:

SELECT i.number, i.start_time, i.end_time, d.dt
FROM inc i
JOIN days d on ((i.start_time-interval 1 day)<d.dt AND i.end_time>d.dt)
WHERE month(inc.sys_created_on) >= %{month_start}
AND month(inc.sys_created_on) < %{month_end}+1
AND year(inc.sys_created_on) = %{year};