feature selection

ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I
edited December 2018 in Help

hi

i have data set with 46 attribute and i want to select feature set that have :

1) maximum relevance to class attribute

2) minimum redundancy

3) minimum number of feature

4) best performance (e.g accuracy + f_measure + AUC)

What should I do for this?

Answers

  • FBTFBT Member Posts: 106 Unicorn

    You may want to take a look at this tutorial here, written by @Thomas_Ott.

    It gives a good introduction to feature selection in RM, with a focus on the two standard methods: forward selection and backward elimination.

     

     

    Thomas_Ott
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @ramzanzadeh72,

     

    In addition to the tutorial of Thomas, you can take a look at this thread.

     

    Regards,

     

    Lionel

    Thomas_Ott
  • ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I

    Dear FBT

    Thanks for your answer

    i use mRMR method but this method just consider Relevance-Redundancy or Relevance/Redundancy, i.e maybe feature set have maximum redundancy but selected by method because have best Relevance-Redundancy or Relevance/Rednudancy, i should select feature set which have most relevance and minimum Redundancy and best performance with mRMR algorithm and beside minimum number of features

  • FBTFBT Member Posts: 106 Unicorn

    Is the order of your requirements in your original post sorted by importance? I.e. do you care more about model training time (i.e. minimum number of feartures) or accuracy? I would have thought that 46 attributes are actually not that much in terms of compute time, but this of course depends on the wider context.

     

    Having said that, the both responses you received will point you in the right direction. Both operators (Backward Elimination and Forward Selection) basically allow you to define your maximum number of attributes. Hence, running any of the two within a Cross Validation will satisfy your requirements 3 and 4 (although you have to manually decide, based on the results, what a sensible number for minimum features is.)

     

    In terms of your requirements 1 and 2, I would probably build in a Log operator to see in detail the effect of a specific feature and parameter selection. 

     

    Once you get the basic understanding of how your features affect your model, you'll than need to play around a bit and tweak parameters to try to find the optimum for your circumstances.

  • ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I

    actually im working on twitter data set  and 46 attribute in huge data set like twitter mybe consume more time, so minimum number of feature set is important here; also not just accuracy, but accuracy+f_measure+AUC is important here, because im working in detection bot account in twitter,so performance important here too; in order to select minimum nember of feature for detection we should consider relevance+redundancy+ performance metric, so in this area what can i do? 

Sign In or Register to comment.