Feature selection operator: final feature set problem

kypexin · August 2019

Hi,

I have 2 questions about FEATURE SELECTION operator.

1. After running the algorithm it generates a few feature sets but selects a certain feature set according to 'balance to accuracy' parameter. For example, this is the Pareto front from which a feature set is chosen for balance = 0.8:

This certain set has 7 features and I may want to also check the bigger (15) or the smaller (5) sets to compare. Is there an easy way to access other feature sets here? Otherwise to obtain another set I have to change the balance parameter and run the process over again, which takes time.

2. This is part of my process which uses feature selection:

Image: https://us.v-cdn.net/6030995/uploads/editor/mw/gmlwe6bcn28a.png

So, what am I doing here:

divide the whole data into training and testing sets using time series variable ( on a time axis : ==== train ==== | == test ==> )
perform feature selection on training set
apply selected features to both subsets
train GLM model on train set
apply GLM model on test set

I am ending up with a couple of certain features chosen by selection algorithm and included into the final feature set, however when I apply this feature set to test data and run GLM model, these features have 0 weights so they are not even included in final regression model.

Why this happens? Can it happen that these 2 features relevant to training set only, but not relevant to test set (remember I use not random but consecutive split of data)?
More generally, is it the correct approach I am using here, or should I always run feature selection algorithm on the full data?

Thanks.

IngoRM · August 2019

Hi,

On the first question: the second port ("population") delivers a collection of all Feature Sets, you can select feature sets out of this collection with the operator Select.

On the second question: the general setup looks good. And yes, it can still happen that selected feature are selected out by the learner again. It is likely that they would have been deselected by the AFE eventually to reduce complexity further, it just did not happen (yet). This is more likely if the selected feature set is on or close to a vertical in the Pareto front BTW.

Cheers,

Ingo

kypexin · August 2019

Hi @IngoRM -- thanks a lot, this helped!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Feature selection operator: final feature set problem

Best Answer

Answers