Feature selection and sliding window validation

Danyo83Danyo83 Member Posts: 41 Maven
edited November 2018 in Help

I want to apply FS since I have lots of features. I first split the whole dataset into training and test set. the testset remains unseen till the very end of the process. For the training set, in order to avoid overfitting I want to use 10 sliding windows (since it is a time series, it is more appropriate than 10 fold Xvalidation).  The sliding window validation shall be conducted within the feature selection operator. How does the feature selection works? Does it for each of the 10 validation take the best features and puts them all together after the process or how does it work?

Thanks in advance



  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Daniel,

    the validations, both Cross and Sliding Window Validation, do not create a combined model, combined features or whatsoever, but it validates how well the algorithm or feature selection method is working on your data. Once you validated that the method is working and which performance it yields, you can use the complete dataset, without the validation, to create the final model or feature set.

    To generate a good feature set, you typically have the Select Attributes operator as the top-level operator, on the inside a validation operator. For each tested feature combination, the validation is executed to estimate the performance of the current feature set, and in the end the feature combination that yields the best performance is returned.

    To validate that the feature selection itself is working well for your data, you could wrap it into another validation, such that you have the hierarchy valdiation -> feature selection -> validation. But since you have your hold-out test set  I would consider that overkill.

    Best regards,
Sign In or Register to comment.