Finding Peak Times in a timeseries dataset

pix123pix123 Member Posts: 27 Contributor I
edited December 2018 in Help

Hi there,

 

I am working with a dataseries that has a date-time stamp in one column. I am looking for a way to identify what are the peak times over the duration of the collected date-time stamps, is there a way to handle this in Rapidminer? If further details are needed, please let me know. Thanks.

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,235   Unicorn

    How do you define "peak" for this purpose?  Finding a single maximum in a series is easily done using a number of different operators.  But finding "peaks" might imply some kind of underlying periodic function or a variable definition of what exactly constitutes a peak.  That kind of analysis is a bit trickier---you might want to check out the Series extension from the marketplace and look at some of the operators in there.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • pix123pix123 Member Posts: 27 Contributor I

    Thanks for the quick reply. By peak I am referring to the time of a given day that is the highest. I am trying to determine at what times of the day usage is highest , the time has been recorded in 30 minute intervals over a 140 day period. I hope this clarifies. Is there a particular operator in the time series extension package you would recommend?

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,235   Unicorn

    It sounds like you have many separate days worth of data, so if you are looking for patterns, you can simply aggregate by time of day (if you have 30 minute intervals then you should have 48 data points per day) and then calculate the average and variance of each one---this will give you a sense of which times are more likely to be higher than others.  You can also get the minimum and maximum for each time of day to see how that compares to the average.

    However, if you are looking to identify the specific time slot on each individual day that corresponds to the maximum value for that day, the process is going to be more complex---you'll have to aggregate by each day to calculate the maximum by day, and then identify which particular timeslot matches that value.

    Neither of these processes would require the series extension, by the way.  That's more useful if you are trying to do things like calculate moving averages, do smoothing of series data, or any time series forecasting such as ARIMA.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,446  Community Manager

    you can also use "Generate Attributes" and create a new attribute that "gets" the hour of the timestamp. Then you can cleanly aggregate, etc...

     

    Screen Shot 2018-04-24 at 1.49.34 PM.png

     

    Scott

Sign In or Register to comment.