"Feature selection stability validation"

MarlaBot · January 2019

A RapidMiner user wants to know the answer to this question: Are there any tutorials or best practices for feature selection stability validation?

ozgeozyazar · January 2019

Hi ! I need to figure out how I can apply selection stability validation process. It is really important for my thesis application part. Does anyone experienced this process ?

Sincerely,

özge

varunm1 · January 2019

Hi @ozgeozyazar

Can you check the below link and see if this is helpful.

https://rapidminer.com/blog/multi-objective-optimization-feature-selection/

MartinLiebig · January 2019

Hi @ozgeozyazar

the feature selection extension can validate the selection via Jaccard Index. Is that what you are referring to?

BR,

Martin

ozgeozyazar · January 2019

Hi @mschmitz !

Nearlly, it is. But as far as ı know, the feature selection stability validation operator uses kuncheva index. I would like to use this operator but cannot find any practice as an example. Could you please advise for any tutorial or example that explains how I can use the process ?

Maerkli · January 2019

@ozgeozyaza,

As you look for an example, it could perhaps help: Book "RapidMiner Datamining Use Cases" by Markus Hofmann and Ralf Klinkenberg, Chapter XVI. There is an example of a weghting operator placed inside the Feature Selection Stability Validation. It concerns an application in Neutrino Astronomy.

Maerkli

ozgeozyazar · January 2019

Hi @Maerkli

unfotunately, I have no chance to find the book immediately. Actually, some resourses indicates that operator works same as x validation. The problem is, I cannot figure out which operator/model should apply in the stability operator. If an example gives answer to that could you please help me ?

Regards,

Maerkli · January 2019

Hi @ozgeozyaza,

Sorry for the late answer, I took some days off. Some features used for this Neutrino experiment are no longer supported. I don't know if it makes sense to send the XML files. But enclosed some passages of the explanation given in this chapter 16:

16.3.6 Feature Selection Stability

When running a feature selection algorithm, not only the selection of attributes itself is
important, but also the stability of the selection has to be taken into account. The stability indicates how much the choice of a good attribute set is independent of the particular sample of examples. If the subsets of features chosen on the basis of different samples are very different, the choice is not stable. The difference of feature sets can be expressed by statistical indices.

Fortunately, an operator for the evaluation of the feature selection is also included in
the Feature Selection extension for RapidMiner. The operator itself is named
Feature Selection Stability Validation.
This operator is somewhat similar to a usual cross validation. It performs an attribute
weighting on a predefined number of subsets and outputs two stability measures. Detailed options as well as the stability measures will be explained later in this section.
In order to reliably estimate the stability of a feature selection, one should loop over the
number of attributes selected in a specic algorithm. For the problem at hand, the process again commences with two Read AML operators that are appended to form a single set of examples. This single example set is then connected to the input port of a Loop Parameters operator. The settings of this operator are rather simple, and are depicted in Figure 16.10.
The Feature Selection Stability Validation (FSSV) is placed inside the Loop
Parameters operator accompanied by a simple Log operator (see Figure 16.11). The
two output ports of the FSSV are connected to the input ports of the Log operators. A
Log operator stores any selected quantity. For the problem at hand, these are the Jaccard index [13] and Kuncheva's index [14]. The Jaccard index S(Fa; Fb) computes the ratio of the intersection and the union of two feature subsets, Fa and Fb:

S(Fa, Fb) =
IFa∩ FbjI / IFa∪ FbjI
:

The settings for the Log operator are depicted in Figure 16.12. It consists of two fields,
the first one being column name. Entries can be added and removed using the Add Entry and Remove Entry buttons, respectively. The entry for column name can be basically anything.
It is helpful to document the meaning of the logged values by a mnemonic name.
The second field oers a drop-down menu from which any operator of the process can
be selected. Whether a certain value that is computed during the process or a process
parameter shall be logged, is selected from the drop-down menu in the third panel. The
fourth field offers the selection of output values or process parameters, respectively, for the selected operator.
An operator for attribute weighting is placed inside the FSSV. For the problem at hand,

Select by MRMR/Cfs is used. However, any other feature selection algorithm can be
used as well.
As can be seen, the process for selecting features in a statistically valid and stable
manner, is quite complex. However, it is also very effective. Here, for a number of attributes between 30 and 40, both stability measures Jaccard and Kuncheva's index lie well above 0.9. Both indices reach the maximum of 1.0, if only one attribute is selected. This indicates that there is one single attribute for the separation of signal and background that is selected under all circumstances. Since other attributes also enhance the learning performance, about 30 more attributes are selected. This substantially decreases the original number of dimensions.

Link for the companion site:

http://rapidminerbook.com/index.php/chapter-downloads-13-24/chapter-16/

I hope that you can do something with that.

Bonne soirée,

Maerkli

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Feature selection stability validation"

Answers

Be Safe. Follow precautions and Maintain Social Distancing