"Weka vs RapidMiner Feature Selection"

hgwelechgwelec Member Posts: 31 Maven
edited May 2019 in Help
Hello,


I was wondering how can one use RM for performing cross validated feature selection *without* the use of a learning method for evaluating the worth of the subset of attributes. For example in WEKA, one can use a cross-validated GainRatio Attribute Evaluator with a Ranker search method but without the use of any classifier. Is this setting possible in RM?



Many Thanks,



Harry
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    I was wondering how can one use RM for performing cross validated feature selection *without* the use of a learning method for evaluating the worth of the subset of attributes. For example in WEKA, one can use a cross-validated GainRatio Attribute Evaluator with a Ranker search method but without the use of any classifier. Is this setting possible in RM?
    Probably yes. But I am not sure if I fully understand what such a process would do. If you could explain the complete validation process in detail we maybe can explain how this can be achieved (if possible) with RapidMiner.

    Cheers,
    Ingo
  • hgwelechgwelec Member Posts: 31 Maven
    Hello Ingo,


    With weka one can choose to have a subset attribute evaluator (say CfsSubsetEval) used with a Best-First search method and the attribute selection can be made by

    1) using the whole training set
    2) Using cross-validation

    I don't know if i made it clear enough, otherwise if you have WEKA available, you can check this setting on the "Select Attributes" tab of the Weka Explorer.


    Could you tell me if such a setup can be implemented on Rapid Miner? I think not because all Validation operators accept only Model as one of their inputs....



    Thanks Again,


    Harry
  • hgwelechgwelec Member Posts: 31 Maven
    Hello again Ingo!


    I just found this post :


    http://lifeanalytics.blogspot.com/2008/10/sowhats-important.html


    Which shows that feature selection is performed with 10-cross validation. One of the figures show the "goodness" of each attribute by the number of times it is chosen in each fold.


    Harry
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi,

    I unfortunately do not quite understand what you specifically are intending to do, hence I can not say whether this is possible in RapidMiner. However, the general feature selection procedure works as the following. Each feature selection operator needs inner operators that are given an example set and must return a performance vector. How this performance vector is created (by cross validation, attribute set evaluators such as [tt]CFSFeatureSetEvaluator[/tt], etc.) does not matter for the feature selection. The search method can be specified by using the appropriate feature selection operator (e.g. [tt]FeatureSelection[/tt] for forward/backward selection, [tt]GeneticAlgorithm[/tt] for an evolutionary search, [tt]BruteForce[/tt] for an exhaustive search, etc.).

    Hope that helps. Otherwise please explain exactly what you intend to do.

    Regards,
    Tobias
  • hgwelechgwelec Member Posts: 31 Maven
    Hello Tobias,

    First of all i am quite new to Data Mining so i apologize if my questions seem vague. I will try to do my best to explain what i am after


    To move on to the problem : I am a WEKA user that found RM to be more versatile to work with...however i am trying now to do tasks(such as Feature Selection) that i used doing with WEKA.

    Weka performs feature selection through Wrapper Approaches (using a classifier to evaluate the worth of Feature Selection) or Filter approaches. I am interested on the Filter Approaches and not the Wrapper Methods.

    Weka is able to cross-validate (ie using 10-fold cross validation) a feature subset found by CfsSubsetEvaluator and by using Best First Forward Selection. The results for the IRIS dataset are as follows :


    === Run information ===

    Evaluator:    weka.attributeSelection.CfsSubsetEval
    Search:       weka.attributeSelection.BestFirst -D 1 -N 5
    Relation:     iris
    Instances:    150
    Attributes:   5
                  sepallength
                  sepalwidth
                  petallength
                  petalwidth
                  class
    Evaluation mode:    10-fold cross-validation



    === Attribute selection 10 fold cross-validation (stratified), seed: 1 ===

    number of folds (%)  attribute
               0(  0 %)   1 sepallength
               0(  0 %)   2 sepalwidth
              10(100 %)   3 petallength
              10(100 %)   4 petalwidth

    (You can find the experiment setup of WEKA,attached)


    Please notice that the results show that  :

    1) Evaluation Mode was 10-fold Cross Validation (the type of Evaluation that i want to do in Rapid Miner)

    2) petallength and petalwidth are present in all 10-folds (that means i think that those 2 features have more predictive value than sepallength and sepalwidth)


    So....can RM perform such an analysis? To be even more specific, can we evaluate the following FS  process setup with cross-validation???



    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="D:\MyDocuments\DataMining\TrainingFiles\mydata.csv"/>
            <parameter key="label_name" value="class"/>
        </operator>
        <operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
            <operator name="CFSFeatureSetEvaluator" class="CFSFeatureSetEvaluator">
                <parameter key="keep_example_set" value="true"/>
            </operator>
        </operator>
    </operator>



    I hope my description was more clear now....again Many Thanks for your help!


    Harry


    [attachment deleted by admin]
Sign In or Register to comment.