RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Feature selection operator: final feature set problem
I have 2 questions about FEATURE SELECTION operator.
1. After running the algorithm it generates a few feature sets but selects a certain feature set according to 'balance to accuracy' parameter. For example, this is the Pareto front from which a feature set is chosen for balance = 0.8:
This certain set has 7 features and I may want to also check the bigger (15) or the smaller (5) sets to compare. Is there an easy way to access other feature sets here? Otherwise to obtain another set I have to change the balance parameter and run the process over again, which takes time.
2. This is part of my process which uses feature selection:
So, what am I doing here:
- divide the whole data into training and testing sets using time series variable ( on a time axis : ==== train ==== | == test ==> )
- perform feature selection on training set
- apply selected features to both subsets
- train GLM model on train set
- apply GLM model on test set
Why this happens? Can it happen that these 2 features relevant to training set only, but not relevant to test set (remember I use not random but consecutive split of data)?
More generally, is it the correct approach I am using here, or should I always run feature selection algorithm on the full data?