"Genetic algorithm for feature selection"

kersorkersor Member Posts: 26 Maven
edited June 2019 in Help

Dear community

 

I want to use The operator Genetic algorithm optimize selection (Evolutionary) for feature selection in a data set with numeric attributes. I would like to know if it is possible how exactly this algorithm works theoretically. More particulary, features are selected independently from the accuracy of the classifier or a subset of features is selected so as not to degrade the performance of the classifier? I think is the second but i want to be sure.

 

The xml of my proccess is atttached.

 

Best Regards

Konstantinos

 

 

 

Best Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi Konstantinos,

     

    I think my PhD might actually be a good point to read up on the feature selection part of RapidMiner.  The middle part covers both the single- as well as multi-objective evolutionary optimization approach.  I typically recommend to go with a multi-objective approach where you try to optimize for the prediction accuracy on one hand and try to minimize the number of features on the other hand.

     

    The link to my PhD is here: http://www-ai.cs.uni-dortmund.de/PublicPublicationFiles/mierswa_2008a.pdf

     

    If the Optimize Selection operator makes the selection dependent on a generic feature relevance scheme or on a specific learner depends on how you build the process.  If you put a cross-validation with a certain learner, let's say Naive Bayes, inside of the Optimize Selection operator, then the feature selection is optimized for the accuracy of this particular learner.  This process in the Sample repository delivered with RapidMiner shows how this works in general:

     

    //Samples/processes/04_Attributes/10_EvolutionaryFeatureSelection

     

    Hope that helps,

    Ingo

  • kersorkersor Member Posts: 26 Maven
    Solution Accepted

    Thank you so much for your reply, you helped me a lot.

     

    Best Regards

Answers

  • reinaldo_gregorreinaldo_gregor Member Posts: 1 Contributor I

    Ingo

    i have just  started to dive into your thesis and wanted to congratutlate you for the evidently very thurough work. Eventhough my background in statistics isn´t that good (i just learned what it needed to complete my phd in demograpny/economics) I am often disappointed by the way many machine learning experts fail to link the alrogithms with mathematical and statistical fundamentals. That doen´t narrow the the bridge between ML and traditional stats/econometrics. And of course the cherry on the pie is your statement about meaningless statistical facts on page 7.. Tks for sharing this with us!

Sign In or Register to comment.