Options

Inside Feature Selection

asiulanaasiulana Member Posts: 6 Contributor II
edited November 2018 in Help
Hello everyone!

I am trying to understand the process  of feature selecion cause i have to explain it throuroughly, so I went to the java code, besides going to the manual of RM.

In the manual you have the explanation of the Forward Selection.

Forward Selection
1. Create an initial population with n individuals where n is the input example
set’s number of attributes. Each individual will use exactly one of the
features.
2. Evaluate the attribute sets and select only the best k.
3. For each of the k attribute sets do: If there are j unused attributes, make
j copies of the attribute set and add exactly one of the previously unused
attributes to the attribute set.
4. As long as the performance improved in the last p iterations go to 2

I am working with SVMWeighting and then apply feature selection, LOOCV, everything works just fine. I am working with microarray data using 7019 attributes (genes) and 108 instances (microarray).

With SVMWeighting I decrease the number of attributes down to 82 attributes and then when applying FS it decreases to 5.


my question regarding the Forward Selection pseudocode is this:
- the "input" for FS is  the 108 number of attributes, my question is if in step 1)  that means I have 108 individuals when I create the initial population?and that each individual uses only one feature right?

- when i come to step 2) i don't understand what you mean by  " Evaluate the attribute sets"? Does it means the program computes all the possible attribute sets?and what do the attribute sets got to do with the individuals?

- in step 3) (and because i don't fully understand step 2))  .... i tried understand with some examples but i don't get the full picture.

If anyone could help, i would really appreciate it =) ...

Thanks in advance.

Ana Luísa

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Ana Luisa,

    let's see if I can help you.
    - the "input" for FS is  the 108 number of attributes, my question is if in step 1)  that means I have 108 individuals when I create the initial population?and that each individual uses only one feature right?
    Yes, absolutly. So every individuum is a set of attributes, containing only one attribute after the initialization.

    - when i come to step 2) i don't understand what you mean by  " Evaluate the attribute sets"? Does it means the program computes all the possible attribute sets?and what do the attribute sets got to do with the individuals?
    Evaluate means, that the inner operators are applied to test their performance on an exampleSet containing all example with the attributes from the current set. As stated above each individual is an attribute set.

    - in step 3) (and because i don't fully understand step 2))  .... i tried understand with some examples but i don't get the full picture.
    Thats not really a question but I will try to explain this step:
    Suppose we have m attributes. If we select the top k attribute sets, there will remain some attributes unused. Suppose there are j of them
    Now we are going to enlarge our sets, to test if an additional feature might result in a better performance. So we create k * j new attribute sets:

    for each A of the k sets do:
        for each of the j unused Attributes f do:
          create new set with A unioned with {f}
    So each iteration of the steps 2-4 will enlarge the attribute sets with one attribute.


    Greetings,
      Sebastian
Sign In or Register to comment.