"feature selection loop"

grahamgraham Member Posts: 6 Contributor II
edited May 2019 in Help
I'm new in rapidminer , so maybe that is why, I couldn't find a way :

lets suppose,  in term of significance, rank of features is known:

t100,t12,t25,t16,.....

now, I want to make an iteration such that :


t(i)=features(1:i):

t(1)=1st element of ordered features    -> t100
t(2)=2 first elements of ordered  features  -> t100,t12


repeat ( there is no any improvement in  classification performance) {
          performance of classification t(i)=p1
          performance of classification t(i-1)=p2
          if (p1<p2
                      t(i-1)=t(i+1)
                      i=i+1
                          }
}


here, there is three problems :

1- making a loop
2- whereas my classifier is Neural network and it's sensitive to initial conditions so each above iteration should be checked several times (say 100 times) and if mean of performances violate the  condition, loop should be break.
3- how can I have a table of accuracy of mean of performance at the end


please help me as much as possible

thanks

Answers

  • cherokeecherokee Member Posts: 82 Maven
    Hi Graham,

    I don't know which RM Version you are using but supposing your using RM5 the answer is rather easy:

    Ad (1) The loop adding features according to their weight is "Optimize Selection (Weight-Guided)" and can be found at Data Transformation/Attribute Set Reduction and Transformation/Selection/Optimization. Unfortunatelly the docu doesn't fit its true parameters i think it will work the way you want.

    Ad (2) Use "Loop and Average" (Process Control/Loop)

    Ad (3) Use "Log" (Utility/Logging)

    If you're using RM4: As far as I remember all those operators existed in previous versions. Unfortunatelly with different names I don't remember right now.

    Hope I could help,
    chero
  • grahamgraham Member Posts: 6 Contributor II
    Hi chero ,

    thanks for the reply.

    "Optimize Selection (Weight-Guided)" uses forward and backward feature selection methods while I have the list of best features in order and question is about minimum numbers of them which produces best performance

    Graham
  • cherokeecherokee Member Posts: 82 Maven
    Hi Graham,

    well it uses forward selection. In my documentation backward elimination is not mentioned (perhaps different builds?).

    Anyhow isn't that what you want? You want to add features one after another (with a given sequence) while your performance improves. Despite the part in brackets that is the definition of sequential forward selection -- imho. The sequence of feature addition can -- with this operator -- be given as attribute weights. If your ranking isn't some kind of attribute weights you can use the operator Weight by User Specification (Modeling/Attribute Weighting) to create attribute weights suiting your needs.

    Best regards,
    chero
  • grahamgraham Member Posts: 6 Contributor II
    Hi chero ,

    thanks alottttttttttttttttttttttttttt for the reply.

    I could find a solution for my problem with your hint, but I donot know how to do this procedure 100 times and make an average of performances
    and confusion matrix table

    With best
    Graham
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="NameBasedWeighting" class="NameBasedWeighting">
            <list key="name_regex_to_weights">
              <parameter key="att1" value="3.0"/>
              <parameter key="att2" value="2.0"/>
              <parameter key="att3" value="1.0"/>
            </list>
            <parameter key="default_weight" value="0.0"/>
        </operator>
        <operator name="WeightGuidedFeatureSelection" class="WeightGuidedFeatureSelection" expanded="yes">
            <parameter key="show_population_plotter" value="true"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="NeuralNetImproved" class="NeuralNetImproved">
                    <list key="hidden_layers">
                    </list>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="File" value="operator.WeightGuidedFeatureSelection.parameter.use_absolute_weights"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
  • cherokeecherokee Member Posts: 82 Maven
    Ok, as the code shows you are using RM 4.x! Let's see how to do this in that version. The operator is called IteratingPerformanceAverage (Validation/Other). Unfortunately I haven't installed the right version to create a working xml but you just have to make your XValidation a child of an IteratingPErformanceAverage. I think you will get it.  ;)

    Greetings,
    chero
  • grahamgraham Member Posts: 6 Contributor II
    hi again with appreciate


    you are right, IteratingPerformanceAverage  was exactly something which I was looking for that,
    but about my main question:
    I run the program but the output is wrong. it stops after selection 3 features while I know there is improvement in performance with  at least 10 features


    features: att1  performance: .65
    features: att1,att2  performance: .67
    features: att1,att2,att3  performance: .69

    but program stops here



    any idea???

    Regards
    Graham



  • cherokeecherokee Member Posts: 82 Maven
    Well what value did you choose for generations without improval? The standard of 1?! The feature selection stops if there is no improvement for this given number of generations.

    So I expect that having 4 features doesn't improve your performance so the algorithm stops correctly.
  • grahamgraham Member Posts: 6 Contributor II
    Hi Chero
    first of al thank you for your consideration

    I checked it manually, I mean by adding features perofmance develops until 18th features. that is a point the codes should report
    but it reports 3rd festures and I don't know  why?
    as a side pointm,  I increased number of generations without improval but it causes changing the order of the best features and  some features are removed, while I want to evalute  the perfmance in such way that depeneds on  my features that I listed in order..

    again thanks

    With the best
    Graham
  • cherokeecherokee Member Posts: 82 Maven
    Puh...

    actually I don't knwo why the order is changed neither why features are removed (after beeing added). Maybe one of the developers can answer your question.

    Nevertheless it would be good to have some real data to reproduce your results. Perhaps you can provide some anonymized data.

    Best regards,
    chero
  • grahamgraham Member Posts: 6 Contributor II
    hi

    I got it, suppose best number of features is 15 and you start a loop from 2 features: there is no guarantee that performance improves in all steps of adding new feature until 15th features. I mean step by step adding feature and looking at the performance just consider local optimum while , data should be considered as a package.

    now , is there any way in RM to increase certain features one by one and save their performances and
    this procedure is done until last feature without stopping in reducing performance cases

    this one my code but it stops b/c it is WeightGuidedFeatureSelection  ::)
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="NameBasedWeighting" class="NameBasedWeighting">
            <list key="name_regex_to_weights">
              <parameter key="att1" value="3.0"/>
              <parameter key="att2" value="2.0"/>
              <parameter key="att3" value="1.0"/>
            </list>
            <parameter key="default_weight" value="0.0"/>
        </operator>
        <operator name="WeightGuidedFeatureSelection" class="WeightGuidedFeatureSelection" expanded="yes">
            <parameter key="show_population_plotter" value="true"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="NeuralNetImproved" class="NeuralNetImproved">
                    <list key="hidden_layers">
                    </list>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <list key="log">
                  <parameter key="File" value="operator.WeightGuidedFeatureSelection.parameter.use_absolute_weights"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
    any idea?

    thanks
    Graham
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I'm not quite sure if this fit's in here, but we have a Forward attribute selection, which would add one attribute after another until a stopping criterion like performance decrease is fulfilled. There is one version in the core and we have a plugin, that delivers an extended and more efficient solution.
    If you need to be able to define the start attribute set, we would have to extend this version with this function, but this would be possible for a relative small fee. If you are interested, feel free to contact me.
    If I missed the topic completely, because I have only read the last post, just ignore this reply :)

    Greetings,
      Sebastian
Sign In or Register to comment.