RapidMiner

How to Optimize Meta-Cost Matrix

SOLVED
Learner III michaelgloven
Learner III

How to Optimize Meta-Cost Matrix

Hi, 

 

We have a classification process to attempt to predict infrequent events in a large dataset, and are using the meta-cost operator to place more value on the performance of these events (like 1 in 1,000) to minimize false negatives . Is there a method to optimize the cost matrix to class recall values, or does the user just need to iterate thru the cost matrix to arrive at acceptable values?

 

thanks!

7 REPLIES
RM Certified Expert
RM Certified Expert

Re: How to Optimize Meta-Cost Matrix

I am not sure I totally understand your question--are you talking about MetaCost or Performance(Costs)?  You might want to look at Performance(Costs) which also allows you to use a cost matrix. Whatever performance operator you put inside your cross-validation (and select the main performance criterion, if there is more than one available) is what will be optimized.  You can then put the entire cross-validation inside an optimization operator if you want to do a grid search across different parameters as well.  

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Highlighted
Learner III michaelgloven
Learner III
Solution

Re: How to Optimize Meta-Cost Matrix

Hi Brian,

 

I'm acuatlly working with technical support on getting further documentation regarding this operator. Once I get more info I'll post back.

 

Mike

RM Staff
RM Staff

Re: How to Optimize Meta-Cost Matrix

Dear Mike,

 

i think MetaCost is not the sole operator you want to focus on. I do think weights/sampling and threshold finding is of equal importance.


First of all you need to define yourself a performance measure which reflects your needs. It should be higher for the cases which are of higher value for you and lower for the others. This can be on a class or example level.

 

Afterwards you train an algorithm. Most algorithms are first of all biased towards the majority class. One way to overcome this is to up/down sample or to use weights. My personal "quick fix" is the Weight by Stratification operator. It adds a weight attribute where sum of weights for all classes is equal. I would set Sum of weights roughly to your #examples.

If the learner you use works with weights, it will now balance both classes. You can of course try to increase the weights by scaling it to direct your learner into the direction you want it.

Tightly connect to this is the option to change the threshold when you start to call it class A. by default we take the maximum confidence and assign this class. With the threshold you can set the thresholds by hand. That way you can do things like "only if confidence(fraud)>0.9 call it fraud".

The concrete value for this would be a metaparameter of the model.

 

Cheers,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Learner III michaelgloven
Learner III

Re: How to Optimize Meta-Cost Matrix

good ideas, easy to implement. thanks!

Learner III michaelgloven
Learner III

Re: How to Optimize Meta-Cost Matrix

Hi Martin,

 

The class recall preformance on one of my classes (binominal) is quite poor and I'm looking for methods to improve results but can't find documentation on how they work. Wanted to see if you could provide some insights:

 

1) The meta-cost operator impacts the confusion matrix but I'd like to know how it works. One can enter real numbers into the cost matrix but it's a mystery what's happening - are these numbers added, multiplied, divided or something more sophisticated? For example, entering ".50" into the bottom left cell of a 2x2 cost matrix does what to the bottom left cell of the 2x2 confusion matrix, or am I missing the overall point here?

 

2) I'm allowing my learner to use weights but I don't see where I can overweight one class versus another. Is this done by using the generate attribute operator to create a new weighted attribute for use in the learner? Setting the total weight as 1  simply distributes 1/2 of 1 to each class of the binominal classes which are then divided amongst respective class examples. Is the intent to then take the example weights from one class and make them larger\smaller thru an if\then calculation in the generate attribute operator?

 

appreciate your response

 

Mike

 

 

RM Certified Expert
RM Certified Expert

Re: How to Optimize Meta-Cost Matrix

Here are some thoughts in response to your most recent questions:

1) The cost matrix is literally that--the values that you place in each cell are supposed to correspond to the "cost" of that outcome, either in real dollar terms or simply relative terms.  The algorithm will then attempt to build a model that minimizes the total cost associated with the outcomes in the confusion matrix.  So if you have a value of 0.5 in one cell, and 1 in another cell, you are saying that the first outcome is half as costly as the 2nd, which means you can afford two of those for every one of the other.  If you have positive outcomes (e.g., predictions that are correct) you an also represent those with negative numbers (i.e., benefits instead of costs).  The relative differences in the cells is the important thing in using this operator.

2) As mentioned previously, if you simply want to balance your two classes, you can use "Generate Weight (Stratification)" and that will assign weights so that the weight sum of each class is equal.  If you have another weighting scheme you would prefer, you can create it using "Generate Attributes" and then use "Set Role" to define your weight attribute accordingly.  Once again, it is the relative weights that matter, rather than the absolute weights, although making the total weight equal to your total number of examples will make your confusion matrix easier to interpret.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Learner III michaelgloven
Learner III

Re: How to Optimize Meta-Cost Matrix

ok, with your insights I understand how these work. For reference and looking at the cost matrix, if I have a 1.0 as my False Negative and 10.0 as my False Positive I am saying that False Positives are 10 times more expensive than False Negatives. I believe I understand how the math works now as it is a diagonal comparison.

 

Many thanks!