Options

"Sliding Window Validation - What Model?"

B_MinerB_Miner Member Posts: 72 Contributor II
edited May 2019 in Help
Hi All,

I will admit I am perplexed by the sliding window validation process (what it does and the parameters). In trying to understand it, the first question is what model is actually fit at the end? Is it the one using the most recent records (with the number of said records depending on the settings in the operator)?
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    do you mean, on which data the model is fitted that will be delivered at the mod port?

    With kind regards,
      Sebastian Land
  • Options
    B_MinerB_Miner Member Posts: 72 Contributor II
    Hi Sebastian,

    Yes, that is what I mean. What is that final model - is it fit using the last k records, where k is set in the parameters as the window?
  • Options
    dcubeddcubed Member Posts: 6 Contributor II
    Hi All,
    I had the same question and couldn't find an answer.
    What model is delivered at the mod port? If a model is returned, what is it's value for future data?

    My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right? Thus, no single model returned at the port will be of value.

    I'm clearly confused. Please help.

    Thank you
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    dcubed wrote:

    My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right?
    Exactly.
    Thus, no single model returned at the port will be of value.
    That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.

    Best, Marius
  • Options
    dcubeddcubed Member Posts: 6 Contributor II

    That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.
    The model thus returned is therefore different from all prior models in that the data used to train it is all the data in the data set not just the data in any of the prior training windows?

    Stated differently, if I have 1000 rows with a training window of 50 validated on the next row, I will have gone through 949 models each with 50 rows of data for training. The model returned, however, will be trained on 1000 rows?

    If the reason I am training on 50 rows to predict the next is because the process generating the rows is not stationary, does it not follow that the final model trained on 1000 rows will be of little value in predicting the 1001 row?
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Dcubed,

    I remember having exactly this exchange with Ingo a year or so ago right here; I was using SVMs to make short term forecasts in foreign exchange markets, and optimised the look-back and prediction horizon sizes in a sliding window validation. The performance figures were fine, as you would expect, but I had to store the model at every iteration within the validation , just to get the last one. Yes, wasteful of course, yes easily fixable, that's the wonder of open source!

    What I, like you, never worked out was the correct scenario for using a model built on all the examples of a concept drift.

    Happy days!
Sign In or Register to comment.