"Calculate performance only on TRUE cases??"

bobdobbs · July 2009

Hello,

First off, I want to say thank you for this great software. I LOVE RapidMiner!!!

On to my question...

We are looking at creating an SVM for detecting positive indications of a medical condition.
We have training data that is labled "true" and "false" along with all the features. (True examples are those where the person has the medical condition. They represent about 20% of the training data.)

When attempting a grid parameter function or a feature selection function we are seeing a problem with finding an ideal result.

WE DON'T CARE ABOUT THE NEGATIVE OR "FALSE" CASES. We only care about the accuracy of the "true" cases.

The problem is that the accuracy performance measure is the average of accuracy for BOTH cases (true and false.) For example, if we just predict everything as false, since 80% of of our examples are false, then we automatically have 40% accuracy, but ZERO correct predictions for the class we care about.

*** I guess what we ultimately want to do is train a SINGLE CLASS SVM that is focused on predicting the true class as accurately as possible. ****

So we don't need a performance scored based on the aggregate accuracy of the model, but ONLY ON THE ACCURACY OF THE "TRUE" PREDICTIONS.

One thought was to use class weighting in either the SVM or classification performance steps, but how much? and which to use?

Another thought was to use some creative application of the meta-cost function, but how would we incorporate that with the libsvm function??

Is this possible in RM?

Any and all ideas would be appreciated.

cherokee · July 2009

Hi bobdobbs,

what you need is the Operator CostEvaluator in category Validation --> Performance. This operator allows you to specify (mis)classification costs for every possible true class - predicted class combination.

Greeting,
Michael

bobdobbs · July 2009

Hello,

The CostEvaluator operator is very useful for measuring the overall success of the model, but it doesn't help train the SVM to focus on finding more positive cases.

Ideally, it might be good to use the grid parameter operator to test multiple weights of the true class in the svm training to find the optimal setting. However it appears that I can't control that from the grid parameter operator.

I also have no idea what costs to assign in the cost evaluator. Should they be in the range of -1 to 1, or perhaps -100 to 100 ???

Thanks!!!!

cherokee · July 2009

Hi,

bobdobbs wrote:

The CostEvaluator operator is very useful for measuring the overall success of the model, but it doesn't help train the SVM to focus on finding more positive cases.

Of course you are right. This doesn't help training the learners.

bobdobbs wrote:
Ideally, it might be good to use the grid parameter operator to test multiple weights of the true class in the svm training to find the optimal setting. However it appears that I can't control that from the grid parameter operator.

Well, you can, with a little trick. Define macro values for each class weight and assign them in the svm. Then you can modify the macros via the grid parameter optimization operator. As in this example:

<operator name="Root" class="Process" expanded="yes">
    <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
    </operator>
    <operator name="Nominal2Numerical" class="Nominal2Numerical">
    </operator>
    <operator name="GridParameterOptimization" class="GridParameterOptimization" expanded="yes">
        <list key="parameters">
          <parameter key="SingleMacroDefinition.value"	value="1.0,0.5,0.25"/>
          <parameter key="SingleMacroDefinition (2).value"	value="1.0,0.5,0.25"/>
        </list>
        <operator name="SingleMacroDefinition" class="SingleMacroDefinition">
            <parameter key="macro"	value="negative_weight"/>
            <parameter key="value"	value="1.0"/>
        </operator>
        <operator name="SingleMacroDefinition (2)" class="SingleMacroDefinition">
            <parameter key="macro"	value="positive_weight"/>
            <parameter key="value"	value="0.5"/>
        </operator>
        <operator name="XValidation" class="XValidation" breakpoints="after" expanded="yes">
            <operator name="LibSVMLearner" class="LibSVMLearner">
                <list key="class_weights">
                  <parameter key="positive"	value="%{positive_weight}"/>
                  <parameter key="negative"	value="%{negative_weight}"/>
                </list>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

bobdobbs · July 2009

Wow,

That's a GREAT trick!!! That will be useful for many projects

I still see a problem:

80% of our training examples are false and 20% are true. We are training an SVM to accurately find true cases.

The problem is that ALL of the learning classifiers look at "accuracy" or "precision" as a measure of their strength. Especially when using some kind of grid. The accuracy is the measure of which features or parameter combination is the "best".

Our problem comes from the unbalanced nature of our data. If RM just predicts EVERY case as false, then we have an automatic accuracy of 40%. (80% false, 0% true averages out to 40% accurate.) Now if the SVM attempts to predict some values as true, it may or may not have success, but initially it is less than 40%. SO, THE MODEL THAT PREDICTS EVERY EXAMPLE AS FALSE GENERALLY WINS. Clearly this is not what we want!!

One situation that just came up was that the SVM predicted 10 examples as true and 9,990 as false (out of 10,000 training examples.) It was 100% accurate for the true examples and about 80% accurate for the false examples. This averaged to an accuracy of 90%. again, clearly not what we want even though the performance measure was very high.

Ideally, what I want is a way to ask RM to train and ONLY evaluate performance based on accuracy of the true examples. I don't care if it has a 99% failure in predicting false cases. I just want the best percentage I can in predicting true cases. \

(I've read some papers where the researches wrote their own SVM in C++ and built it to focus on correct true prediction. Almost like a one class SVM, but with some negative examples.)

Is there some clever way to do this in RM??

steffen · July 2009

Hello

Why dont you tell the GridParameterOptimization to use another performance measure ? Since you have a binary classification problem you could use the operator BinominalClassificationPerformance, especially AUC and Lift. These measures focus on the quality of prediction of the positive class.

regards,

Steffen

bobdobbs · July 2009

Nice idea!

Now my classes are labelled "sick" and "not_sick".

Do I need to change that with some operator to make it a true "binary" problem?

How do I tell RM which is the positive class?

Thanks!!!!

fischer · July 2009

Hi,

bobdobbs wrote:

Do I need to change that with some operator to make it a true "binary" problem?

You can always use the InternalBinominalRemapping operator to define which is the "positive" class and which is the "negative" one. If you are not using this operator it might appear more or less random which is which, unless you use aml files.

Cheers,
Simon

bobdobbs · July 2009

Simon,

I can't seem to find this operator. Where is it?

Thank You

fischer · July 2009

Oops, I'm sorry. The operator was not yet included in the last release. You can simply use "Mapping" instead. In the "value_mappings" list, simply map "not_sick" to "not_sick" and "sick" to "sick". Just make sure to start with the negative class, so it will be assigned index 0.

Cheers,
Simon

bobdobbs · July 2009

Thanks Simon,

I just builit a setup with grid parameter for both C and the weight of the positive class. It will probably take several hours to run, but I'm very curious to see what the results will be!!!

Thanks again!!

cherokee · July 2009

bobdobbs wrote:

Ideally, what I want is a way to ask RM to train and ONLY evaluate performance based on accuracy of the true examples. I don't care if it has a 99% failure in predicting false cases. I just want the best percentage I can in predicting true cases. \

Well I'm afraid this is not what you really(!) want, or?
If this is what you really want -- just predict the true cases and ignore the false cases -- than I would just predict everything as true! 100% accurancy for true class, 100% error for false class.

I'm nearly sure you have to think of a secondary condition for your model, e.g. not predicting more than 50% positive in total.

Best regards,
Michael

bobdobbs · July 2009

Cherokee,

Your suggestion wouldn't work. Only about 20% of my training examples are true. So if I predict everything as true, then my accuracy for that class is only 20%. What I'm looking for is accuracy of 100% (Never possible in the real world, but the goal is to see how close I can get.)

My point was that I only care about the accuracy of predicting true cases.

fischer · July 2009

Ok, just to be sure. Are we sure we are talking about the same things? The phrase "accuracy of the true class" is a bit uncommon. Do we agree on these definitions of accuracy, precision, and recall?

http://en.wikipedia.org/wiki/Accuracy#Accuracy_in_binary_classification
http://en.wikipedia.org/wiki/Precision_(information_retrieval)
http://en.wikipedia.org/wiki/Recall_(information_retrieval)

Cheers,
Simon

bobdobbs · July 2009

Maybe I have the nomenclature wrong.

Our goal is to predict the "true" class as well as possible.

So the measure I need is accuracy, but for only the true class

Modifying the formula from wikipedia:

number_of_predicted_true_that_are_correct / total_number_predicted_true

In other words, "Of all the cases predicted try by RM, what percentage of them are actually true.

i.e. If RM predicts that 100 cases are true, but only 42 of them are actually true (correct prediction.) then I would say that we have a 42% accuracy of predicting a sick person with this model.

I hope that I didn't over explain this...

fischer · July 2009

So what you are looking for is the "precision", not the accuracy. You can compute the precision with the BinominalClassificationPerformance operator (if positive and negative class is correctly assigned).

Cheers,
Simon

bobdobbs · July 2009

Good point. Is it correct to assume that the "precision" only deals with the "true" class??

Alternatively, would the AUC be a better measure when comparing different models??
(For example, with a grid search for the best value of C in an SVM?)

Remember, even though this is technically a two class problem, I'm really only interested in the best performance for predicting the true class. IT WOULD BE SAFER FOR US TO WRONGLY PREDICT SOME TRUE EXAMPLES AS FALSE. WHAT IS DANGEROUS IS WRONGLY PREDICTING SOME FALSE EXAMPLES AS TRUE. (Giving treatment to someone who isn't sick poses a huge risk.)

Thanks!!

fischer · July 2009

bobdobbs wrote:

Good point. Is it correct to assume that the "precision" only deals with the "true" class??

Yes.

bobdobbs wrote:

Alternatively, would the AUC be a better measure when comparing different models??
(For example, with a grid search for the best value of C in an SVM?)

Remember, even though this is technically a two class problem, I'm really only interested in the best performance for predicting the true class. IT WOULD BE SAFER FOR US TO WRONGLY PREDICT SOME TRUE EXAMPLES AS FALSE. WHAT IS DANGEROUS IS WRONGLY PREDICTING SOME FALSE EXAMPLES AS TRUE. (Giving treatment to someone who isn't sick poses a huge risk.)

You don't have to shout at me :-)

Since in all your postings you have been describing precision as the desired measure I wonder why you would now switch to AUC. However, this is your choice and purely depends on the domain and your goals. Keep in mind that for evaluating your results it is easily possible to compute both measures.

Best,
Simon

bobdobbs · July 2009

Simon,

I'm very sorry. I didn't realize that I was shouting.... (I was using the caps to emphasize a key point, I did not mean to offend you.)

I learned something today. I always though precision looked at both true and false classes. I had no idea it was only for the true class. That helps a lot.

I'm not sure what a better measure of performance would be:
1) AUC
2) Precsion with a threshshold finder beforehand to optimize results

Additionally, one thing that worries me is the number of correct results. For example: I may have 1000 training examples with 200 that are true. If the model only predicts 5 as true, but correctly, then the precision would be 100%. Unfortunately, a model that only found 5 out of 200 wouldn't be very good for practical uses.

So, I need some combination of precision combined with "volume" or something like that. Is there such a thing?

Thanks again for all the help, we really appreciate it over here!!!

fischer · July 2009

bobdobbs wrote:

I'm very sorry. I didn't realize that I was shouting.... (I was using the caps to emphasize a key point, I did not mean to offend you.)

See the smiley in my post? I was not offended :-)

bobdobbs wrote:

Additionally, one thing that worries me is the number of correct results. For example: I may have 1000 training examples with 200 that are true. If the model only predicts 5 as true, but correctly, then the precision would be 100%. Unfortunately, a model that only found 5 out of 200 wouldn't be very good for practical uses.

So, I need some combination of precision combined with "volume" or something like that. Is there such a thing?

You can also look at the f-measure or any of the other performance measures offered by the performance operators. What applies best in your case I cannot say.

However, if you are unhappy with all of them, you can still construct your own, e.g. by using an AttributeConstruction in combination with an Aggregation. Alternatively, go with the BinominalClassificationPerformance, log the values of false positive etc. using a ProcessLog, turn it into an ExampleSet and use an AttributeConstruction on this one, and turn it back into a PerformanceVector using Data2Performance.

Hope this helps,
Simon

bobdobbs · July 2009

Simon,

You are correct: The f-measure is exactly what I am thinking of.

Interestingly (at least to me) is that over repeated trials with different parameter setting, the f-measure and the AUC seems to always correlate perfectly. I guess this makes since since they are both, in effect, measuring the performance of the model.

Thank You!!!!

-B

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Calculate performance only on TRUE cases??"

Answers