"feature selection loop"

graham · November 2009

I'm new in rapidminer , so maybe that is why, I couldn't find a way :

lets suppose, in term of significance, rank of features is known:

t100,t12,t25,t16,.....

now, I want to make an iteration such that :

t(i)=features(1:i):

t(1)=1st element of ordered features -> t100
t(2)=2 first elements of ordered features -> t100,t12

repeat ( there is no any improvement in classification performance) {
performance of classification t(i)=p1
performance of classification t(i-1)=p2
if (p1<p2
t(i-1)=t(i+1)
i=i+1
}
}

here, there is three problems :

1- making a loop
2- whereas my classifier is Neural network and it's sensitive to initial conditions so each above iteration should be checked several times (say 100 times) and if mean of performances violate the condition, loop should be break.
3- how can I have a table of accuracy of mean of performance at the end

please help me as much as possible

thanks

cherokee · November 2009

Hi Graham,

I don't know which RM Version you are using but supposing your using RM5 the answer is rather easy:

Ad (1) The loop adding features according to their weight is "Optimize Selection (Weight-Guided)" and can be found at Data Transformation/Attribute Set Reduction and Transformation/Selection/Optimization. Unfortunatelly the docu doesn't fit its true parameters i think it will work the way you want.

Ad (2) Use "Loop and Average" (Process Control/Loop)

Ad (3) Use "Log" (Utility/Logging)

If you're using RM4: As far as I remember all those operators existed in previous versions. Unfortunatelly with different names I don't remember right now.

Hope I could help,
chero

graham · November 2009

Hi chero ,

thanks for the reply.

"Optimize Selection (Weight-Guided)" uses forward and backward feature selection methods while I have the list of best features in order and question is about minimum numbers of them which produces best performance

Graham

cherokee · November 2009

Hi Graham,

well it uses forward selection. In my documentation backward elimination is not mentioned (perhaps different builds?).

Anyhow isn't that what you want? You want to add features one after another (with a given sequence) while your performance improves. Despite the part in brackets that is the definition of sequential forward selection -- imho. The sequence of feature addition can -- with this operator -- be given as attribute weights. If your ranking isn't some kind of attribute weights you can use the operator Weight by User Specification (Modeling/Attribute Weighting) to create attribute weights suiting your needs.

Best regards,
chero

graham · November 2009

Hi chero ,

thanks alottttttttttttttttttttttttttt for the reply.

I could find a solution for my problem with your hint, but I donot know how to do this procedure 100 times and make an average of performances
and confusion matrix table

With best
Graham

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="simple non linear classification"/>
    </operator>
    <operator name="NameBasedWeighting" class="NameBasedWeighting">
        <list key="name_regex_to_weights">
          <parameter key="att1"	value="3.0"/>
          <parameter key="att2"	value="2.0"/>
          <parameter key="att3"	value="1.0"/>
        </list>
        <parameter key="default_weight"	value="0.0"/>
    </operator>
    <operator name="WeightGuidedFeatureSelection" class="WeightGuidedFeatureSelection" expanded="yes">
        <parameter key="show_population_plotter"	value="true"/>
        <operator name="Xvalidation" class="XValidation" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="sampling_type"	value="shuffled sampling"/>
            <operator name="NeuralNetImproved" class="NeuralNetImproved">
                <list key="hidden_layers">
                </list>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier (2)" class="ModelApplier">
                    <parameter key="keep_model"	value="true"/>
                    <list key="application_parameters">
                    </list>
                    <parameter key="create_view"	value="true"/>
                </operator>
                <operator name="Performance (2)" class="Performance">
                    <parameter key="keep_example_set"	value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="File"	value="operator.WeightGuidedFeatureSelection.parameter.use_absolute_weights"/>
              <parameter key="Performance"	value="operator.Xvalidation.value.performance"/>
              <parameter key="Deviation"	value="operator.Xvalidation.value.deviation"/>
            </list>
            <parameter key="sorting_dimension"	value="3"/>
        </operator>
    </operator>
</operator>

cherokee · November 2009

Ok, as the code shows you are using RM 4.x! Let's see how to do this in that version. The operator is called IteratingPerformanceAverage (Validation/Other). Unfortunately I haven't installed the right version to create a working xml but you just have to make your XValidation a child of an IteratingPErformanceAverage. I think you will get it.

Greetings,
chero

graham · November 2009

hi again with appreciate

you are right, IteratingPerformanceAverage was exactly something which I was looking for that,
but about my main question:
I run the program but the output is wrong. it stops after selection 3 features while I know there is improvement in performance with at least 10 features

features: att1 performance: .65
features: att1,att2 performance: .67
features: att1,att2,att3 performance: .69

but program stops here

any idea???

Regards
Graham

cherokee · November 2009

Well what value did you choose for generations without improval? The standard of 1?! The feature selection stops if there is no improvement for this given number of generations.

So I expect that having 4 features doesn't improve your performance so the algorithm stops correctly.

graham · November 2009

Hi Chero
first of al thank you for your consideration

I checked it manually, I mean by adding features perofmance develops until 18th features. that is a point the codes should report
but it reports 3rd festures and I don't know why?
as a side pointm, I increased number of generations without improval but it causes changing the order of the best features and some features are removed, while I want to evalute the perfmance in such way that depeneds on my features that I listed in order..

again thanks

With the best
Graham

cherokee · November 2009

Puh...

actually I don't knwo why the order is changed neither why features are removed (after beeing added). Maybe one of the developers can answer your question.

Nevertheless it would be good to have some real data to reproduce your results. Perhaps you can provide some anonymized data.

Best regards,
chero

graham · December 2009

hi

I got it, suppose best number of features is 15 and you start a loop from 2 features: there is no guarantee that performance improves in all steps of adding new feature until 15th features. I mean step by step adding feature and looking at the performance just consider local optimum while , data should be considered as a package.

now , is there any way in RM to increase certain features one by one and save their performances and
this procedure is done until last feature without stopping in reducing performance cases

this one my code but it stops b/c it is WeightGuidedFeatureSelection ::)

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"	value="simple non linear classification"/>
    </operator>
    <operator name="NameBasedWeighting" class="NameBasedWeighting">
        <list key="name_regex_to_weights">
          <parameter key="att1"	value="3.0"/>
          <parameter key="att2"	value="2.0"/>
          <parameter key="att3"	value="1.0"/>
        </list>
        <parameter key="default_weight"	value="0.0"/>
    </operator>
    <operator name="WeightGuidedFeatureSelection" class="WeightGuidedFeatureSelection" expanded="yes">
        <parameter key="show_population_plotter"	value="true"/>
        <operator name="Xvalidation" class="XValidation" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="sampling_type"	value="shuffled sampling"/>
            <operator name="NeuralNetImproved" class="NeuralNetImproved">
                <list key="hidden_layers">
                </list>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier (2)" class="ModelApplier">
                    <parameter key="keep_model"	value="true"/>
                    <list key="application_parameters">
                    </list>
                    <parameter key="create_view"	value="true"/>
                </operator>
                <operator name="Performance (2)" class="Performance">
                    <parameter key="keep_example_set"	value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="File"	value="operator.WeightGuidedFeatureSelection.parameter.use_absolute_weights"/>
              <parameter key="Performance"	value="operator.Xvalidation.value.performance"/>
              <parameter key="Deviation"	value="operator.Xvalidation.value.deviation"/>
            </list>
            <parameter key="sorting_dimension"	value="3"/>
        </operator>
    </operator>
</operator>

any idea?

thanks
Graham

land · December 2009

Hi,
I'm not quite sure if this fit's in here, but we have a Forward attribute selection, which would add one attribute after another until a stopping criterion like performance decrease is fulfilled. There is one version in the core and we have a plugin, that delivers an extended and more efficient solution.
If you need to be able to define the start attribute set, we would have to extend this version with this function, but this would be possible for a relative small fee. If you are interested, feel free to contact me.
If I missed the topic completely, because I have only read the last post, just ignore this reply

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"feature selection loop"

Answers