Feature selection

keith · March 2009

When using GeneticAlgorithm to do feature selection, is it necessary to use the same learner as you intend to use in the final model, or can you use a less compute-intensive learner to do feature selection, and save the more time-consuming learner for the final model building?

My situation:

I have a example set with about 18,000 rows and 37 or so attributes. There are several sets of 3-4 attributes that are highly correlated with one another. All of the attributes are numerical (in some cases by using Nominal2Numerical). I am building a regression model (predicting a numerical label)

My intent is to use genetic feature selection to choose some optimal subset of attributes and then build a NearestNeighbor or W-LWL model with attribute weights to scale the relative importance of each feature. EvolutionaryWeighting would be used to determine the best weights.

Evaluating a KNN or LWL model is time consuming, and both the feature selection and the evolutionary weighting steps involve many iterations of model evaluation, The entire process takes far more time than I would like.

My thought was to use a simpler learner (say, LinearRegression) for feature selection to save time, and only use the KNN/LWL model in the subsequent EvolutionaryWeighting step to fine-tune the actual desired model. But I'm concerned that using different learners in the two different stages will yield invalid results.

If this is an acceptable approach, do you have any recommendations for which regression learners to try for feature selection?

Thanks,
Keith

land · March 2009

Hi Keith,
unfortunately your approach will probably fail, because the different learning algorithms discover different dependencies between the attributes. But if you have a regression problem, linear regression is always worth a try and the attributes do not need to be weighted at all, because the linear regression will do it internally. But since the linear regression tries to learn the data estimating only as many coefficients as attributes, it even boosts their performance if you construct new features, depicting some combination of features. FeatureConstruction might be used for this.
To prevent overfitting, you could then perform a feature selection throwing out unnecessary features.

Greetings,
Sebastian

keith · March 2009

Thanks Sebastian, I had a feeling that what I wanted to do wouldn't work, but it's good to have it confirmed.

I have been trying linear regression too, but from previous work, it's known that local regression or nearest neighbors are good models for the type of problem I'm working on, so i was hoping to be able to find an efficient way to combine them with feature selection.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Feature selection

Answers