🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Cross-validation Features

JohnNash2000JohnNash2000 Member Posts: 2 Newbie
Hello, I am currently performing cross-validation (CV), and within this process, "Forward Selection" is performed during training. How can I output the chosen features once CV has completed? I've tried countless solutions including using the "Weights to Data" and "Data to Weights" operators, but neither of these output the chosen features. Does anyone know how I can extract the chosen features from the "Cross Validation" process?

Thank you

Best Answer

Answers

  • JohnNash2000JohnNash2000 Member Posts: 2 Newbie
    Hello @varunm1

    You are 100% correct, there is no final set of features since each iteration of CV will have its own feature set. You see, I recently read the blog post about contamination ("Avoiding Accidental Contamination of Data [3 Examples]"), and so I moved my feature selection process from outside of CV to inside. When the feature selection process was outside, I had a chosen set of features based on the entire training data. This is what I was looking for, and I became so blinded in finding how to do this, I never stopped to think why.

    Thank you



    [Deleted User]
  • varunm1varunm1 Moderator, Member Posts: 1,207   Unicorn
    Thats true @JohnNash2000 if we are validating a model, the preprocessing steps like sampling, feature selection should be applied on training side. If we apply on whole data it will bias the model and some times over estimates the performance.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.