Time Series Forecasting for many examples

NoelNoel Member Posts: 5 Contributor I
Hi All-
[Apologies in advance for any confusing or vague language I may use; I'm not a data scientist, so I don't know the proper terminology.]

Say I have a data set of sales volume over time for a retailer that sells screwdrivers. Their product catalog really runs the gamut: flathead, phillips, torx, long, short, every color you can think of, and on and on. If you wanted to forecast demand, you could create a model for one series at a time for each product (e.g. short, yellow, flathead screwdrivers and then medium length, purple, torx drivers with fat handles, etc), or one could aggregate sales for all phillips head screwdrivers or all the different types of screwdrivers in order to collapse them into one series.
For some reason, though, let's say you wanted to use all the data from every type of screwdrivers individually to train a model. For each date, you would have data points for every type of screwdriver in inventory. 

What is the "right way" to represent this in RapidMiner?

@sgenzer ;@tftemme

Best Answers

  • hughesfleming68hughesfleming68 Posts: 185   Unicorn
    edited March 20 Solution Accepted
    Hi Noel,

    I see what you are trying to do. In most cases simpler is better. Treat each ID as an independent prediction and try and determine which of your attributes actually contains any signal. Select the attribute that you feel is contributing the most and with a series of joins, build a table that consists of your assets and one windowed attribute and run that through your cross validation. A real world example would be using data from sector ETF's to predict overall market direction. Remember to set your cross validation to linear sampling or better still, use a sliding window validation. Also take a look at your normalization. If you normalize first and then combine your assets, you will lose the relationship between them as you put them on the same scale. You might want to do this but there are cases where you might not.

    I am not sure that combining the attributes the way your are suggesting will give you the results you are looking for. Working up from the simplest model is always the best as it is already hard to separate signal from noise.

    Be aware that differentiation in order to achieve a stationary time series may actually result in over differentiation. A partial solution is to use fractional differentiation and the Augmented Dicky-Fuller test and estimate how much differentiation is actually necessary to achieve a stationary time series. This may or may not be necessary but it is worth investigating if it gives you better results. PM me if you would like the Python code to test this. Rather than using ADF tests, I prefer to set a loop of values for the fractional differentiation and see what effects it has on my prediction. Rapidminer is great for this kind of testing.

    Regards,

    Alex

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,088   Unicorn
    I may not be understanding your question.  When you say "you wanted to use all the data from every type of screwdriver individually to train a model," do you mean that you want a single model representing total demand for screwdrivers?  If so, then the conventional approach would be to first aggregate all sales by date, and then use the traditional time series approaches to model that (e.g, it could be ARIMA or Holt-Winters, etc.).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    Noel
  • NoelNoel Member Posts: 5 Contributor I
    Thanks for the response, Telcontar120!
    Assume I have daily sales numbers for every flavor of screwdriver carried by the retailer and do not want to aggregate the data into a single time series. (Unfortunately, my ability to use this analogy breaks down here because I can't think of why one would want to model sales of each type of screwdriver individually instead of aggregating them... I can try to come up with another analogy if it is helpful, but I'm afraid it will muddy the water.)
  • Brian_WellsBrian_Wells Member Posts: 5 Contributor II
    Hello, Noel!  I can completely relate to your situation and though I am struggling to get a clean forecasting model built using a neural net and windowing, I might have some insight into your issue.  First, I agree with you that forecasting at the most granular level then rolling up is a typically a more accurate approach, provided that you have enough data at the resolution you are attempting to model (entirely separate rabbit hole).  This approach is supported by numerous texts I have read as well as a couple of Ingo's videos I have seen.

    I am starting (I think) from the opposite side of the fence in that I was given a sales report from one of our business units which had records over a 10 year period with hundreds of different aggregation levels with their hierarchy described by a forecast "key" attribute which was simply the product family, product, model, sales channel, store classification, etc.  As an engineer and not a finance person, this approach initially threw me for a loop until I realized that I could set up a standard template to simply filter the key for the level of granularity that I wanted and apply my forecast model to those records.

    If your sales data is not combined in such a way, it is easy to create the keys yourself then append the data together and use the approach I took.  The reason I chose this approach was because once you determine the level of granularity that is appropriate and filter to that level, you can peel off the list of keys, de-duplicate, and use it to set up a loop to isolate, for example, the sales data for each part, apply your forecast model, collect the results, and finally, use TurboPrep to pivot and aggregate the resulting forecast data to create whatever roll-up reports you need to create.

    Hope this makes sense.  Key takeaway - looping and filtering is a huge force multiplier in cases like these.

    Cheers!
    sgenzerTelcontar120CraigBostonUSANoel
  • NoelNoel Member Posts: 5 Contributor I
    Thanks @Brian_Wells for the response! I really appreciate you taking the time to post. I tested your technique for 5 examples and I'm hoping folks can have a look and verify that this is the "right" approach...

    I have time series of five assets' prices and the values of the index in which they belong. I want to train a model on all five individually and forecast a value for the index one period into the future.

    Intuitively, it feels like I should iterate over the five assets, window their attributes, and "feed" them to the model one at a time. The first window for Asset #1 would look something like:

         Asset #1 Px - 2 | Asset #1 Px - 1 | Asset #1 Px - 0 | Index val - 2 | Index val - 1 | Index val - 0

    and you'd do that four more times for Assets 2-5. I can't get this method to work, though, and as a novice in machine learning, I'm not even sure it makes sense. Joining all the assets' data together for each date also comes to mind:

         Asset #1 Px - 2 | Asset #1 Px - 1 | Asset #1 Px - 0 | Asset #2 Px - 2 | Asset #2 Px - 1 | Asset #2 Px - 0 | ... other Assets, Index vals....

    but with a lot of assets that have many attributes and potentially wide windows, I could see that getting out of hand.

    I ended up looping through all the asset IDs, windowing each series, and appending the results one after another. The data set going into the model looks like: Asset #1 windowed data, Asset #2 windowed data, ... Asset #5 windowed data

    Does this "scramble" the chronology? Does the model know it is seeing five examples for the same time frame and index values?

    Any help would be greatly appreciated! (process and data attached).

    Best,
    Noel

    @sgenzer @Telcontar120 @Pavithra_Rao @CraigBostonUSA @hughesfleming68
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor Posts: 2,188  Community Manager
    hi @Noel I'm calling Dr. Temme out of the bullpen for this one - he's our time series guru. :wink:

    Scott
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 276   Unicorn
    Hi @Noel

    the use case looks like a good candidate for vector autoregressive models (e.g. VAR or VARMA), that AFIK are not yet implemented in RapidMiner, however it's easy to use them with the scripting operators. I have used VAR models with R Scripting before.

    Let me know if it interests you or rings a bell for you, I can prepare a sample process if so.

    Regards,
    Sebastian

    sgenzer
Sign In or Register to comment.