Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Download RapidMiner Studio or Server 8.0 Public Beta. Let us know how you like it! Ends November 27.
Watch RapidMiner's "Getting Started" videos on YouTube. Everything you need to do data science - fast and simple!
I'm trying to use the Windowing and Sliding Window Validator to predict future values. I've watched Thomas Ott YouTube video and looked at other posts in the forum, but I am still not confident using these operators so I'd like to ask some questions. I want to look at the settings at a very basic level to understand how to use them.
Let's say I have 1000 examples in my training set that covers 1000 days of a stock price. Is my understanding here correct?
First, The Windowing operator:
Window size: This is the number of days RapidMiner (RM) will use to predict the future value. If I set it to 10, RM will use 10 days of data to predict the future value. For example (let's not think about holidays and weekends), it will use Jan 1 -> Jan 10 to predict Jan 11.
Step Size: Decides which values to skip, or step over. If the step size is 7, RM will only use the values of Jan 1, 8, 15 etc. The skipped values will be left out and not used for predictions. It is the same as creating a new dataset with the first day of every week, setting step size to 1.
Create label: Here I choose the attribute I want to predict. I set it to "Yes" and chose the closing price attribute.
Here, we also have to set the horizon. Let's say my Window size is 10, step size is 1. If horizon is set to 1, RM will use the values of Jan 1 - Jan 10 to predict the value of Jan 11. If horizon is set to 5, RM will use the values of Jan 1 to Jan 10, to predict the value of Jan 15. Is that right?
Now on to The Sliding Window Validation operator.
Now, as far as I understand, the validator does not improve the model in itself. It is simply a tool to validate whether or not the model I have created is performing well. The results from the validator can be used to understand the model better and optimize it. Correct?
In the validator I find these settings.
Training Window Width
Training Window Step Size
Test Window Width
Here, I am not quite sure what to do. Should these settings simply correspond to the settings in the Windowing operator? I believe this is not the right answer.
Following my previous examples, could we create similar examples for these settings to put it into context?
Solved! Go to Solution.
Did you check out this thread post? I go into pretty deep detail on the Windowing operator.
Yes, I've read this post and it's very good on the windowing operator. So I just wanted to confirm that my understanding of this operator was correct. I guess my question is really about the sliding window validator and how the settings work in relation with the windowing operator.
As far as I understand the settings doesn't affect the model, only test the performance, right?
Would it be right to say that setting a larger window width in the window validator is comparable to reducing number of folds in an x-validator?
Training step size and horizon is still unclear to me.
The Sliding Window Validation is used for backtesting. Once you've windowed your data, it will slide across your time series in a defined way and train on a window then try to test it on another window.
The training window width is just that, that's how many time units (width) you want your model to be trained on. The Testing window is your out of sample data in the time series where it tests the model and measures the performance. The steps side is how many time units you slide the window ahead. The horizon is just the time unit space between the Training and Test windows.
Does this help?
Ok, so the validator basically takes a window of the windowed data, tests it and moves on to the next, sliding through the data till the end of the example set? (The validator window size is not really related to the Windower window size)