influence of adding last index to windows attribute - time series data.

ThiruThiru Member Posts: 100 Guru
edited August 2020 in Help
dear all, im working on a time series data. refer the enclosed process.

1. currently - Im generating features using 'process windows' and extract aggregate as sub process. The  extracted features  are given to train my machine learning model.
2.  Ive noticed -  by choosing yes for 'adding last index to windows attribute' in  the parameter of process windows operator, improves the performance of the model drastically.  i.e. from 67% accuracy to 97% accuracy. Ive noticed the difference is adding one extra column in the generated features column.  I' m not able to get this point of how this influence the performance of the model.  

 Is it correct to consider this performance of 97% & can anyone help to understand the role of adding last index. thanks.



  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    As I have no access to your data I cannot replicate it exactly. The last index in window attribute is special and is added only so that you could retain the index in the new example set (as an ID). Note however that since you aggregate your time series and you do not use any of the special attributes (except for the label), the last index vanishes anyway. So there is no impact on the result. You must have changed something else in your process. You may have got the random effect from a different mix of data coming on different runs - to eliminate this set the random seed in Split Data and Cross Validation operators and see if you still get the amazing performance on two runs. Also try simplifying your process (e.g. remove your stacked ensemble) to isolate the effect.
Sign In or Register to comment.