Need help on removing classifier model skew

ram_nit05 · June 2009

Hi,

To provide a bried background to my exercise,
My objective is to create a SVM Classifier model which would classify customer feedback(attribute) into one of the various categories(Label).For this am trying to generate features from feedback verbatims which I then pass as attributes to the model.

The issue that am facing is, it could be observed from the classification errors of the model that the model is highly skewed towards the categories where the number of occurences was high(for highest frequency segment: class precision = low but class recall = high), i.e, the categories with lower frequencies were also being predicted as the ones with highest frequency. I have tried weighting the lower frequency segments suitably to remove differences in the occurences, but the errors are only getting magnified. Please let me know if there is any other way in which this can be controlled.

Many thanks in advance,
Ram

keith · June 2009

My first thought would be to either oversample the categories that occur less frequently, or to take only a portion of the very frequent categories so that the training set has approximately equal proportions of each category. This might do better than just giving a higher weight to the examples from rare categories.

You might also look at the MetaCost operator to increase the penalty for misclassifying the rarer instances.

Hope that helps. I'm sure other people smarter than me will chime in as well. :-)

Keith

ram_nit05 · June 2009

Many thanks Keith for the help.

ram_nit05 · June 2009

Hi Keith,

I tried using Metacost operator in my modeling flow today, however I got error saying that it cannot take in numerical attributes, and I seem to be unable to understand if I should use it during the before or after %xvalidation operator in the flow. Could you please provide some link where I can find information on the same.

Thanks,
Ram

land · June 2009

Hi,
probably there is an error in your process setup. It seems to me, that you have used a learner inside the metaCost operator, that does not support the handling of numerical attributes. You should check that.

Greetings,
Sebastian

ram_nit05 · June 2009

Many thanks for your help Sebastian

brianbaker · November 2009

I go the same numeric error with a learner that does support numeric processing, and throws no error outside of metacost.

land · November 2009

Hi,
please be a little bit more specific. It would be of great help posting the process for example and describing what you are going to do.

Greetings,
Sebastian

brianbaker · November 2009

This works:


    <operator name="SimpleValidation (2)" class="SimpleValidation" breakpoints="after" expanded="yes">
        <parameter key="local_random_seed"	value="10"/>
        <operator name="JMySVMLearner" class="JMySVMLearner">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="max_iterations"	value="100"/>
            <parameter key="calculate_weights"	value="true"/>
            <parameter key="return_optimization_performance"	value="true"/>
            <parameter key="estimate_performance"	value="true"/>
            <parameter key="balance_cost"	value="true"/>
        </operator>
        <operator name="ApplierChain (3)" class="OperatorChain" expanded="yes">
            <operator name="Applier (3)" class="ModelApplier">
                <parameter key="keep_model"	value="true"/>
                <list key="application_parameters">
                </list>
                <parameter key="create_view"	value="true"/>
            </operator>
            <operator name="BinominalClassificationPerformance (2)" class="BinominalClassificationPerformance">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="main_criterion"	value="AUC"/>
                <parameter key="AUC"	value="true"/>
                <parameter key="lift"	value="true"/>
                <parameter key="false_positive"	value="true"/>
                <parameter key="false_negative"	value="true"/>
                <parameter key="true_positive"	value="true"/>
                <parameter key="true_negative"	value="true"/>
            </operator>
        </operator>

But this doesn't: Error in: MetaCost (MetaCost) This learning scheme does not have sufficient capabilities for the given data set: numerical attributes not supported


    <operator name="MetaCost" class="MetaCost" expanded="yes">
        <parameter key="keep_example_set"	value="true"/>
        <parameter key="cost_matrix"	value="[0.0 1.0;5.0 0.0]"/>
        <operator name="SimpleValidation (2)" class="SimpleValidation" breakpoints="after" expanded="yes">
            <parameter key="local_random_seed"	value="10"/>
            <operator name="JMySVMLearner" class="JMySVMLearner">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="max_iterations"	value="100"/>
                <parameter key="calculate_weights"	value="true"/>
                <parameter key="return_optimization_performance"	value="true"/>
                <parameter key="estimate_performance"	value="true"/>
                <parameter key="balance_cost"	value="true"/>
            </operator>
            <operator name="ApplierChain (3)" class="OperatorChain" expanded="yes">
                <operator name="Applier (3)" class="ModelApplier">
                    <parameter key="keep_model"	value="true"/>
                    <list key="application_parameters">
                    </list>
                    <parameter key="create_view"	value="true"/>
                </operator>
                <operator name="BinominalClassificationPerformance (2)" class="BinominalClassificationPerformance">
                    <parameter key="keep_example_set"	value="true"/>
                    <parameter key="main_criterion"	value="AUC"/>
                    <parameter key="AUC"	value="true"/>
                    <parameter key="lift"	value="true"/>
                    <parameter key="false_positive"	value="true"/>
                    <parameter key="false_negative"	value="true"/>
                    <parameter key="true_positive"	value="true"/>
                    <parameter key="true_negative"	value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>

Thank you for your help. I have a small positive rate, < 10%, and a small data set. So, I'd like to modify the cost for the learner and use cross-validation rather than oversampling (so I don't have to split into train, test, validate).

land · November 2009

Hi,
although you just sent a small part of the process, I can definitively say that this will not work. The MetaCost operator will need an inner learner for operating, hence it is called Metacost. It works simply like that:
For performing a cross-validation you need an inner learner. You want to modify the svm for the imbalanced class set by using the metaCost operator. Then put the SVM directly into the MetaCost operator and then put the MetaCost operator as learner inside the SVM.

Greetings,
Sebastian

brianbaker · November 2009

This confuses me:

put the SVM directly into the MetaCost operator and then put the MetaCost operator as learner inside the SVM

Did you mean this:put the SVM directly into the MetaCost operator and then put the MetaCost operator as learner inside the XValidation

I tried that and it works. So, I think I am using it correctly. I'm getting the balancing I'm after.


    <operator name="confidence estimate" class="XValidation" breakpoints="after" expanded="yes">
        <parameter key="keep_example_set"	value="true"/>
        <parameter key="create_complete_model"	value="true"/>
        <operator name="MetaCost (2)" class="MetaCost" expanded="yes">
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="cost_matrix"	value="[0.0 3.0;1.0 0.0]"/>
            <operator name="KernelNaiveBayes (5)" class="KernelNaiveBayes">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="estimation_mode"	value="full"/>
                <parameter key="number_of_kernels"	value="35"/>
            </operator>
        </operator>
        <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier (2)" class="ModelApplier">
                <parameter key="keep_model"	value="true"/>
                <list key="application_parameters">
                </list>
                <parameter key="create_view"	value="true"/>
            </operator>
            <operator name="Performance (2)" class="Performance">
                <parameter key="keep_example_set"	value="true"/>
            </operator>
        </operator>

Thanks for your help!!

land · November 2009

Hi,
what confused you, was my confusion. Of course I meant it the way, you actual did it

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Need help on removing classifier model skew

Answers