"How to build prediction models with lagged series"

qwertzqwertz Member Posts: 130 Contributor II
edited June 2019 in Help
Dear all,

in order to predict time series I see two possible approaches. As I am not that experienced yet I would like to discuss the advantages and disadvantages with you. For prediction tasks my general assumption is that attributes are needed which are correlated in some way with the label. However, especially in time series it may occur that the maximal correlation is lagged.



Example data set:

label  att1  att2
5      1    7
6      2    8
7      3    9
8      4    10
9      5    11

Question 1)
I would assume that in the given example only att2 contributes in predicting the label more than att1 as its course is ahead of the label. Would you agree on that, even if att1 shows the same development in course of time?



Approach 1)
I could imagine to compare label and attribute one by one for each single attribute and each lag (i.e. correlation of label vs att1-0, label vs att1-1, label vs att1-2, label vs. att2-0, ...). Then I would take only those attributes / lags that provide the best correlation and feed it into a learner for prediction.

Approach 2)
I could use windowing operator to create several lags for every attribute in one data set (i.e. label, att1-0, att1-1, att1-2, att2-0, ... in one set). The result would be a data set with lots of attributes that I can then feed into the learner again, assuming that the learner itself decides on the attributes that describe the label best.

(Think that is also the idea in this article http://rapid-i.com/rapidforum/index.php/topic,200.0.html However, one has to be careful that not only lagged variants of the same attribute are considered for the model then (e.g. att1-0, att1-1, att1-2) as in this case the model would be build more or less on copies of the same data.)




Please advise which approach you favour and what your expirience with time series is like.


Best regards
Sachs
Tagged:

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi,

    Repugnant self-publicity but you may find this helpful, if you do you probably also need counselling.  ;)

    http://rapid-i.com/rapidforum/index.php/topic,4491.msg16309.html#msg16309

    Best

    H
  • qwertzqwertz Member Posts: 130 Contributor II

    Hi haddock,

    Well, of what I understood the takeaway messages are:
    - never bet on predicted values
    - better use classification instead of real values
    Important input - thank you for that.

    As you seem to have a lot of experience with time series, may I also ask for your opinion regarding the two different approaches?


    PS: So that is your nickname's background: "he continues to plough a lonely furrow across the  oceans of data"  :)



    Cheers
    Sachs
  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    I ended up making a list of the assumptions involved in the construction of my own process, they underpinned parameter optimisation, the validation, even the SVM approach itself ( which assumes a separation space of constant proportions ).

    My natural response to your question is that approach one adds to the assumption base, whereas approach two optimises a necessary parameter, namely the training window; that's also what I did, and is therefore obviously a better approach  ;)

    Best

    H


  • qwertzqwertz Member Posts: 130 Contributor II

    Good afternoon,

    After discussing the theoretical background I would like to put emphasis on the realization - namely of approach 2 then.

    As mentioned before the windowing operator is going to create lots of lagged copies of every attribute.
    Thanks to Haddock I am aware that it is not advisible to build a model on data copies (somewhere written here in the forum - but cannot find the post now)

    Example:
    1) Data set consists of label, att1 and att2
    2) Windowing provides label, att1-0, att1-1, att1-2, att2-0, att2-1, att2-2
    3) Learner "decides" that att1-0 and att1-2 describe the label best.
    --> and there we have a biased model because it is based on copies of the same data which has no additional information

    It would be ok if the learner decides to skip all lags of att2 if it doesn't contribute in explaining the label. However, the problem is to make sure that not more than one variant of each label is considered.
    --> a * att1-x & a * att2-x with a [0,1] and x [0,1,2]


    Even if I use an optimizer it cannot assure that at the utmost only one variant of an attribute remains  ???
    May I ask for any ideas how to set this up in Rapidminer?


    All the best
    Sachs
  • qwertzqwertz Member Posts: 130 Contributor II

    Do you know these moments when you have a challenging task and you are stuck?
    And then you do something completely different and all of a sudden... there it is :)

    What about this: Instead of windowing first one could Loop through all attributes. Within one loop one can apply the windowing then and finally select the desired time lag (e.g. according to correlation).
    *remark 1: still have to figure out which the best way ist to do this*
    After one could the selected attributes of each iteration.
    *remark 2: again not yet sure how to do combining in this case. But it should either work with postprocessing of the collection provided by the loop operator or with the remember and recall operators*


    Now it's time to sleep  :D


    Cheers
    Sachs
Sign In or Register to comment.