"Problem with Feature Selection"

vitalimariovitalimario Member Posts: 6 Contributor II
edited June 2019 in Help
Hi,


I am trying to perform Feature Selection, then apply the reduced subset of features to a J48 tree and finally present the tree to the output.

As you will see i have the 'create_complete_model option' checked
Unfortunately *all* tutorials of feature selection in RM use either SVM or NearestNeigbours models which both do not return an output!

Here is my setting :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="c::\MyDocuments\score.csv"/>
        <parameter key="label_name" value="class"/>
    </operator>
    <operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
        <parameter key="selection_direction" value="backward"/>
        <operator name="SimpleValidation" class="SimpleValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <operator name="W-J48" class="W-J48">
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="Generation" value="operator.LibSVMLearner.value.applycount"/>
            </list>
        </operator>
    </operator>
</operator>




What am i doing wrong ?  ???

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello

    I am afraid I got you wrong, but:
    The decision tree you create is used within the "performance measurement"-process to determine the (sub-)optimal set of features. If you want to create a Decision Tree with the best feature subset you must add something like this (at top level):

        <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
        </operator>
        <operator name="W-J48 (2)" class="W-J48">
        </operator>
    hope this was helpful,

    Steffen
  • vitalimariovitalimario Member Posts: 6 Contributor II
    Hi Steffen,

    Let me be more specific  :)

    Originally i would like to know whether the setup on the RM tutorial 12_WrapperValidation.xml  can use a J48 decision tree (instead of JMySVMLearner and Regression Problem) and show the results (ie the Decision Tree) to the output : I changed the setup and i am not able to get the resulting Final  tree (after feature selection)  to the output :

    Here is the final setup :


    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\test.csv"/>
            <parameter key="label_name" value="class"/>
        </operator>
        <operator name="WrapperXValidation" class="WrapperXValidation" expanded="yes">
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="FeatureSelection" class="FeatureSelection" expanded="yes">
                <operator name="FSXValidation" class="XValidation" expanded="yes">
                    <parameter key="sampling_type" value="shuffled sampling"/>
                    <operator name="W-J48" class="W-J48">
                    </operator>
                    <operator name="FSOperatorChain" class="OperatorChain" expanded="yes">
                        <operator name="FSModelApplier" class="ModelApplier">
                            <list key="application_parameters">
                            </list>
                        </operator>
                        <operator name="ClassificationPerformance" class="ClassificationPerformance">
                            <parameter key="accuracy" value="true"/>
                            <list key="class_weights">
                            </list>
                            <parameter key="classification_error" value="true"/>
                            <parameter key="kappa" value="true"/>
                            <parameter key="spearman_rho" value="true"/>
                            <parameter key="weighted_mean_precision" value="true"/>
                            <parameter key="weighted_mean_recall" value="true"/>
                        </operator>
                        <operator name="FSMinMaxWrapper" class="MinMaxWrapper">
                            <parameter key="minimum_weight" value="0.5"/>
                        </operator>
                    </operator>
                </operator>
            </operator>
            <operator name="W-J48 (2)" class="W-J48">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance (2)" class="ClassificationPerformance">
                    <parameter key="accuracy" value="true"/>
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                    <parameter key="kappa" value="true"/>
                    <parameter key="weighted_mean_recall" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>




    Thanks!

  • steffensteffen Member Posts: 347 Maven
    Hello again

    Hm .. ok  I think I got it now. Since there is no option to build a complete model, you cannot do it "in the same process".

    You just have to repeat this part:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="FeatureSelection" class="FeatureSelection" expanded="no">
            <operator name="FSXValidation" class="XValidation" expanded="no">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="W-J48" class="W-J48">
                </operator>
                <operator name="FSOperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="FSModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="ClassificationPerformance" class="ClassificationPerformance">
                        <parameter key="accuracy" value="true"/>
                        <list key="class_weights">
                        </list>
                        <parameter key="classification_error" value="true"/>
                        <parameter key="kappa" value="true"/>
                        <parameter key="spearman_rho" value="true"/>
                        <parameter key="weighted_mean_precision" value="true"/>
                        <parameter key="weighted_mean_recall" value="true"/>
                    </operator>
                    <operator name="FSMinMaxWrapper" class="MinMaxWrapper">
                        <parameter key="minimum_weight" value="0.5"/>
                    </operator>
                </operator>
            </operator>
        </operator>
        <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
        </operator>
        <operator name="W-J48 (2)" class="W-J48">
        </operator>
    </operator>
    because this is exactly what "build complete model" does/would do.

    hope this was helpful,

    Steffen
  • vitalimariovitalimario Member Posts: 6 Contributor II
    Dear Steffen,



    It worked GREAT! Thanks again :)
Sign In or Register to comment.