Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

influence of adding last index to windows attribute - time series data.

ThiruThiru Member Posts: 100 Guru
edited August 2020 in Help
dear all, im working on a time series data. refer the enclosed process.

1. currently - Im generating features using 'process windows' and extract aggregate as sub process. The  extracted features  are given to train my machine learning model.
2.  Ive noticed -  by choosing yes for 'adding last index to windows attribute' in  the parameter of process windows operator, improves the performance of the model drastically.  i.e. from 67% accuracy to 97% accuracy. Ive noticed the difference is adding one extra column in the generated features column.  I' m not able to get this point of how this influence the performance of the model.  

 Is it correct to consider this performance of 97% & can anyone help to understand the role of adding last index. thanks.

regds
thiru
Tagged:

Answers

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    As I have no access to your data I cannot replicate it exactly. The last index in window attribute is special and is added only so that you could retain the index in the new example set (as an ID). Note however that since you aggregate your time series and you do not use any of the special attributes (except for the label), the last index vanishes anyway. So there is no impact on the result. You must have changed something else in your process. You may have got the random effect from a different mix of data coming on different runs - to eliminate this set the random seed in Split Data and Cross Validation operators and see if you still get the amazing performance on two runs. Also try simplifying your process (e.g. remove your stacked ensemble) to isolate the effect.
    Jacob
Sign In or Register to comment.