🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

influence of adding last index - time series data

ThiruThiru Member Posts: 85  Guru
edited August 2020 in Help
dear all, im working on a time series data. refer the enclosed process.

1. currently - Im generating features using 'process windows' and extract aggregate as sub process. The  extracted features  are given to train my machine learning model.
2.  Ive noticed -  by choosing yes for 'adding last index to windows attribute' in  the parameter of process windows operator, improves the performance of the model drastically.  i.e. from 67% accuracy to 97% accuracy. Ive noticed the difference is adding one extra column in the generated features column.  I' m not able to get this point of how this influence the performance of the model.  

 Is it correct to consider this performance of 97% & can anyone help to understand the role of adding last index. thanks.

regds
thiru

Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,852  RM Data Scientist
    Hi,
    be careful that you do not overtrain your model on dates. It can easily happen, that you learn something like "february was good", which is a rule you do not want to use.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • ThiruThiru Member Posts: 85  Guru
    @mschmitz

    thanks for your reply.  I understand -  this additional column just " a - date - value repeating for every window size" in this case. (correct me if i'm wrong. )     I assume it over trains here. I do not know for sure. 
     Btw, what is the use of  having this parameter in 'process windows' operator  and can you throw some insight in how it determines or improves the performance of the time series model? . thanks.

    regds
    thiru
  • ThiruThiru Member Posts: 85  Guru
    @mschmitz

    The operator ' Process windows" or  'Windowing"   -  previously had  the parameter "add last index in windows attribute'.   now in the current version 9.8.001 - that option is not available.  

    For the same data and process - i was getting accuracy of 67%, 
     But now im  getting -  97.8%   ( Now i have no option of using  - 'add last index' ).  

    Im not sure  im going through the right thing.  could you please re confirm. thanks.

    thiru
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,852  RM Data Scientist
    Hi @Thiru ,
    its hard to diagnose this without seeing the process and such. I think we changed the parameters of windowing a bit, since you always want to have the last index. Since it is usually a special attribtue its ignored anyway for learning. Maybe you change it later on to regular?

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.