"Feature selection: Combining forward selection and backward elimination"

alimayalimay Member Posts: 6 Contributor II
edited May 2019 in Help
Hi everyone,

I'm looking for an efficient way combine attribute subgroups coming from a forward selection and a backward elimination performed on the same data set.

So what I want to do is the following:
I retrieve a single set and I split it into two (70% and 30%). I want perform two diferent types of feature selection (mentioned above) on the first partition, and somehow combine the selected attributes coming from these two operators. I am not sure how to do this combination (whether I should take the intersection, or do some  sort of voting etc, so any suggestion is more than welcome), but I want to have a single attribute set at the end (which I want to "remember": please see http://rapid-i.com/rapidforum/index.php/topic,3631.0.html). Can anyone throw some ideas about it?

I also have another question. I want to do to the thing above for 2 different data sets, which have the same samples but different attributes. Both of the sets contain around 100 samples and one of them contains ~9000 attributes, while the other contains ~3000 attributes. The label attributes of the samples is a polynominal attribute (3 classes). I have a fear that when using forward and backward selection in a data set like this (with huge number of attributes), very early stopping might take place and for instance in the forward selection it may stop when the 100th attribute hits, but maybe there were much more important attributes in the remaining set. Also in the backward elimination, just the other way around. So I think speculative rounds and stopping behaviour is specially designed to overcome such a negative side of the heuristic, but still chosing this parameters is difficult (without having a wise sense about the data), and using parameter optimization is exteremely CPU-time intensive. Still, can anyone suggest a proper way to decide the alpha value for chosing the stopping behaviour as without signigicant increase (in the case of forward selection) and without significant decrease in the case of backward elimination? Because I think (correct if I'm wrong) this is one of the best ways to overcome the problem I mentioned: very early stopping.

Thank you very much for your comments in advance.

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    ... and one of them contains ~9000 attributes, while the other contains ~3000 attributes.
    Actually, I would probably not go for a FS / BE wrapper approach at all for this number of features. In my experience, the risk for sticking in local optima is even increased for higher number of features and you would have to specify that you would allow a couple of rounds without increasing the performance. And this probably takes way too much time if you have to perform a k-fold cross validation including k training runs for each performance estimation. Maybe the new feature selection extension of Benjamin Schowe might be interesting for you? He will present it at the RCOMM in Dublin this year and maybe you want to participate to learn more and discuss options with him. He also posted in the forum a couple of times already.

    Sorry, I don't know an optimal method for determining alpha right now. However, finding the optimal alpha might only be your "first" problem here, but at the end I am afraid that even with an optimal alpha the necessary time is much too high...

    Cheers,
    Ingo
Sign In or Register to comment.