Rolling features (Rolling mean, max, min, sum, ...) would be nice

Status: New
by fryasdf ‎01-20-2017 01:31 PM

Whenever I am doing Data Science and try to predict a target variable I almost certainly include the past of the target variable. For example: When predicting how long some process will take then I would almost always include 'how long does it usually take' as feature or even as a baseline model. One can compute this 'how long does it usually take' in different ways. For example: For every different process one could just take the average over the whole training set. However, this could be a bad idea due to the fact that the length of the process may depend on seasonalities or other mechanisms in the training data. That is why I prefer rolling window functions to do so, i.e.


rollingMean((15,17,12,11,19,25,27,30,28), 3) would be something like (14.66667, 13.33333, 14.00000, 18.33333, 23.66667, 27.33333, 28.33333, 29.00000)


This is not yet at all included in RM although it is a rather common thing to do in the DS business.

Elite II


this feature does already exist in RapidMiner. If you install the free Series extension, there's a moving average operator that does exactly what you want. It aggregates over a fixed window length and moving this window over the dataset. You can select the usual aggregation functions, so you can also compute the standard deviation of a window, which can also be helpful.