Feature selection operator: final feature set problem

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

I have 2 questions about FEATURE SELECTION operator.

1. After running the algorithm it generates a few feature sets but selects a certain feature set according to 'balance to accuracy' parameter. For example, this is the Pareto front from which a feature set is chosen for balance = 0.8:

This certain set has 7 features and I may want to also check the bigger (15) or the smaller (5) sets to compare. Is there an easy way to access other feature sets here? Otherwise to obtain another set I have to change the balance parameter and run the process over again, which takes time.   

2. This is part of my process which uses feature selection:

So, what am I doing here:
  • divide the whole data into training and testing sets using time series variable ( on a time axis : ==== train ==== | == test ==> )
  • perform feature selection on training set 
  • apply selected features to both subsets
  • train GLM model on train set
  • apply GLM model on test set
I am ending up with a couple of certain features chosen by selection algorithm and included into the final feature set, however when I apply this feature set to test data and run GLM model, these features have 0 weights so they are not even included in final regression model. 

Why this happens? Can it happen that these 2 features relevant to training set only, but not relevant to test set (remember I use not random but consecutive split of data)?
More generally, is it the correct approach I am using here, or should I always run feature selection algorithm on the full data?


Best Answer


Sign In or Register to comment.