AUPRC with imbalanced classes

kypexin · April 2018

Hi, it seems I am not getting expected results when using Performance (AUPRC) with highly imbalanced dataset.

The relationship between recall and precision of positive class seems pretty intuitive, but I still get AUPRC = 0.010 regardless of anything:

Screenshot 2018-04-25 23.28.32.png Screenshot 2018-04-25 23.28.14.png

I am using here imbalanced credit card fraud dataset.

At the same time when I artificially balance data, AUPRC shows expected 'normal' values:

Screenshot 2018-04-25 23.35.06.png Screenshot 2018-04-25 23.34.59.png

Process attached:

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve creditcard" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../data/creditcard"/>
      </operator>
      <operator activated="true" class="sample" compatibility="8.1.003" expanded="true" height="82" name="equalize classes" width="90" x="179" y="34">
        <parameter key="balance_data" value="true"/>
        <list key="sample_size_per_class">
          <parameter key="1" value="492"/>
          <parameter key="0" value="492"/>
        </list>
        <list key="sample_ratio_per_class"/>
        <list key="sample_probability_per_class"/>
      </operator>
      <operator activated="false" class="sample_stratified" compatibility="8.1.003" expanded="true" height="82" name="sample 50k" width="90" x="45" y="340">
        <parameter key="sample_size" value="50000"/>
      </operator>
      <operator activated="false" class="create_threshold" compatibility="8.1.003" expanded="true" height="68" name="Create Threshold" width="90" x="581" y="391">
        <parameter key="threshold" value="0.09"/>
        <parameter key="first_class" value="0"/>
        <parameter key="second_class" value="1"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="8.1.003" expanded="true" height="103" name="Split Data" width="90" x="246" y="136">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.8"/>
          <parameter key="ratio" value="0.2"/>
        </enumeration>
        <parameter key="sampling_type" value="stratified sampling"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.003" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="false" class="concurrency:parallel_decision_tree" compatibility="8.1.003" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="136">
            <parameter key="apply_pruning" value="false"/>
            <parameter key="apply_prepruning" value="false"/>
          </operator>
          <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="246" y="34">
            <list key="beta_constraints"/>
            <list key="expert_parameters"/>
          </operator>
          <operator activated="false" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning" width="90" x="380" y="136">
            <enumeration key="hidden_layer_sizes">
              <parameter key="hidden_layer_sizes" value="50"/>
              <parameter key="hidden_layer_sizes" value="50"/>
            </enumeration>
            <enumeration key="hidden_dropout_ratios"/>
            <list key="expert_parameters"/>
            <list key="expert_parameters_"/>
          </operator>
          <operator activated="false" class="stacking" compatibility="8.1.003" expanded="true" height="68" name="Stacking" width="90" x="179" y="289">
            <process expanded="true">
              <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model (2)" width="90" x="179" y="187">
                <list key="beta_constraints"/>
                <list key="expert_parameters"/>
              </operator>
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.003" expanded="true" height="103" name="Decision Tree (2)" width="90" x="112" y="34">
                <parameter key="apply_pruning" value="false"/>
                <parameter key="apply_prepruning" value="false"/>
              </operator>
              <operator activated="true" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning (2)" width="90" x="112" y="340">
                <enumeration key="hidden_layer_sizes">
                  <parameter key="hidden_layer_sizes" value="20"/>
                  <parameter key="hidden_layer_sizes" value="20"/>
                </enumeration>
                <enumeration key="hidden_dropout_ratios"/>
                <list key="expert_parameters"/>
                <list key="expert_parameters_"/>
              </operator>
              <connect from_port="training set 1" to_op="Decision Tree (2)" to_port="training set"/>
              <connect from_port="training set 2" to_op="Generalized Linear Model (2)" to_port="training set"/>
              <connect from_port="training set 3" to_op="Deep Learning (2)" to_port="training set"/>
              <connect from_op="Generalized Linear Model (2)" from_port="model" to_port="base model 2"/>
              <connect from_op="Decision Tree (2)" from_port="model" to_port="base model 1"/>
              <connect from_op="Deep Learning (2)" from_port="model" to_port="base model 3"/>
              <portSpacing port="source_training set 1" spacing="0"/>
              <portSpacing port="source_training set 2" spacing="0"/>
              <portSpacing port="source_training set 3" spacing="0"/>
              <portSpacing port="source_training set 4" spacing="0"/>
              <portSpacing port="sink_base model 1" spacing="0"/>
              <portSpacing port="sink_base model 2" spacing="0"/>
              <portSpacing port="sink_base model 3" spacing="0"/>
              <portSpacing port="sink_base model 4" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.6.001" expanded="true" height="124" name="Generalized Linear Model (3)" width="90" x="45" y="34">
                <list key="beta_constraints"/>
                <list key="expert_parameters"/>
              </operator>
              <connect from_port="stacking examples" to_op="Generalized Linear Model (3)" to_port="training set"/>
              <connect from_op="Generalized Linear Model (3)" from_port="model" to_port="stacking model"/>
              <portSpacing port="source_stacking examples" spacing="0"/>
              <portSpacing port="sink_stacking model" spacing="0"/>
            </process>
          </operator>
          <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
          <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.1.003" expanded="true" height="82" name="apply on train" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="operator_toolbox:performance_auprc" compatibility="1.0.000" expanded="true" height="82" name="perf train" width="90" x="246" y="34">
            <parameter key="main_criterion" value="AUPRC"/>
            <parameter key="AUC" value="true"/>
            <parameter key="AUPRC" value="true"/>
          </operator>
          <connect from_port="model" to_op="apply on train" to_port="model"/>
          <connect from_port="test set" to_op="apply on train" to_port="unlabelled data"/>
          <connect from_op="apply on train" from_port="labelled data" to_op="perf train" to_port="labelled data"/>
          <connect from_op="perf train" from_port="performance" to_port="performance 1"/>
          <connect from_op="perf train" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="8.1.003" expanded="true" height="82" name="apply on test" width="90" x="581" y="136">
        <list key="application_parameters"/>
      </operator>
      <operator activated="false" class="select_recall" compatibility="8.1.003" expanded="true" height="82" name="Select Recall" width="90" x="581" y="289">
        <parameter key="min_recall" value="0.8"/>
        <parameter key="positive_label" value="1"/>
      </operator>
      <operator activated="false" class="apply_threshold" compatibility="8.1.003" expanded="true" height="82" name="Apply Threshold" width="90" x="715" y="289"/>
      <operator activated="true" class="performance" compatibility="8.1.003" expanded="true" height="82" name="perf test" width="90" x="715" y="136"/>
      <operator activated="true" class="operator_toolbox:performance_auprc" compatibility="1.0.000" expanded="true" height="82" name="perf test (2)" width="90" x="849" y="136">
        <parameter key="main_criterion" value="AUPRC"/>
        <parameter key="accuracy" value="false"/>
        <parameter key="AUPRC" value="true"/>
      </operator>
      <connect from_op="Retrieve creditcard" from_port="output" to_op="equalize classes" to_port="example set input"/>
      <connect from_op="equalize classes" from_port="example set output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Validation" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="apply on test" to_port="unlabelled data"/>
      <connect from_op="Validation" from_port="model" to_op="apply on test" to_port="model"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
      <connect from_op="apply on test" from_port="labelled data" to_op="perf test" to_port="labelled data"/>
      <connect from_op="Select Recall" from_port="example set" to_op="Apply Threshold" to_port="example set"/>
      <connect from_op="Select Recall" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/>
      <connect from_op="perf test" from_port="performance" to_op="perf test (2)" to_port="performance"/>
      <connect from_op="perf test" from_port="example set" to_op="perf test (2)" to_port="labelled data"/>
      <connect from_op="perf test (2)" from_port="performance" to_port="result 2"/>
      <connect from_op="perf test (2)" from_port="example set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

MartinLiebig · April 2018

Hi @kypexin,

isn't that exactly what you would expect? AUPRC is NOT independend of class balance. If you add more and more of one class, then the precision will go down for the other class. Thus the curve becomes flatter and the integral less. 0.5 is thus not the lower threshold anymore.

Best,

Martin

kypexin · April 2018

Hi @mschmitz

Honestly, no, I have expected it exactly the other way around.

If we assume that the curve shows precision against the recall of the same positive class (in our case '1'), then varying recall of positive class gives the following:

Low recall, high precision (6/100)

Screenshot 2018-04-26 09.38.56.png

High recall, low precision (93/6)

Screenshot 2018-04-26 09.39.54.png

Around optimum (80/80)

Screenshot 2018-04-26 09.41.25.png

Or do I interpret AUPRC completely wrong? (never used it before in practice)

kypexin · April 2018

PS @mschmitz to give you more intuition, this is a PR curve I am getting on my data (it least what I understand to be that curve)

Screenshot 2018-04-26 10.35.55.png

MartinLiebig · April 2018

Hi @kypexin,

what happens if you switch class balance? it should go down, right?

Best,

Martin

kypexin · April 2018

Not sure if I got you right, @mschmitz

If I just remap classes, I will get AUPRC = 0.999 and also this (obviously for majority class it will be really close to 1):

Screenshot 2018-04-26 10.47.59.png

Screenshot 2018-04-26 10.49.17.png

However this still does not give me an intuition why in thge 1st case AUPRC = 0.010 while it should be not to my logfical expectation.

MartinLiebig · April 2018

Hey @kypexin,

Here is how i see this. If you have a different class balance you transform the space. Essentially Recall for your positive class stays the same, but the precision for a given recall point changes. This may look like this:

2018-04-26 11.14.35.jpg Upper: Normal PR-Curve, Lower with a different Class Ratio

If you have a look at the math, you can see Precision as a function of recall like this:

2018-04-26 11.09.38.jpg

adding more Negative falues will lead to more FN (false negatives) and thus less precision. So naturally AURPC drops with changing class balance (if the classifer does not counter this.)

kypexin · April 2018

Hey @mschmitz

I totally agree with the point that "adding more Negative falues will lead to more FN (false negatives) and thus less precision, so naturally AURPC drops with changing class balance". But at the same time, I observe influence of class imbalance on AUPRC is realy lower then we would expect.

I made tests on different imbalance ratio datasets, with 1:1, 1:10, 1:100 and 1:500 class ratios. Below are the PR curves for that cases. As we see, while imbalance increases, AUPRC drops, but not really much.

Screenshot 2018-04-26 15.01.17.png class ratio 1:1 Screenshot 2018-04-26 15.01.50.png class ratio 1:10

Screenshot 2018-04-26 15.02.24.png class ratio 1:100 Screenshot 2018-04-26 15.03.12.png class ratio 1:500

So the question is, why the operator itself provides AUPRC values non-relevant to these plots, unless of course I am committing some serious mistake.

I attach my process which is used for estimating these curves, plus my test labelled dataset as well from which different ratios can be sampled.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve scored data" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Local Repository/kaggle - fraud/data/scored data 500 - 250000"/>
      </operator>
      <operator activated="true" class="concurrency:loop_parameters" compatibility="8.1.003" expanded="true" height="103" name="Loop Parameters" width="90" x="313" y="34">
        <list key="parameters">
          <parameter key="Select Recall.min_recall" value="[0.0;1.0;100;linear]"/>
        </list>
        <parameter key="log_all_criteria" value="true"/>
        <process expanded="true">
          <operator activated="true" class="select_recall" compatibility="8.1.003" expanded="true" height="82" name="Select Recall" width="90" x="45" y="34">
            <parameter key="min_recall" value="0.8"/>
            <parameter key="positive_label" value="1"/>
          </operator>
          <operator activated="true" class="apply_threshold" compatibility="8.1.003" expanded="true" height="82" name="Apply Threshold" width="90" x="179" y="34"/>
          <operator activated="true" class="performance" compatibility="8.1.003" expanded="true" height="82" name="perf test" width="90" x="313" y="34"/>
          <operator activated="true" class="operator_toolbox:performance_auprc" compatibility="1.0.000" expanded="true" height="82" name="perf test (2)" width="90" x="447" y="34">
            <parameter key="main_criterion" value="AUPRC"/>
            <parameter key="accuracy" value="false"/>
            <parameter key="AUPRC" value="true"/>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="8.1.003" expanded="true" height="82" name="Performance to Data" width="90" x="581" y="34"/>
          <connect from_port="input 1" to_op="Select Recall" to_port="example set"/>
          <connect from_op="Select Recall" from_port="example set" to_op="Apply Threshold" to_port="example set"/>
          <connect from_op="Select Recall" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/>
          <connect from_op="Apply Threshold" from_port="example set" to_op="perf test" to_port="labelled data"/>
          <connect from_op="perf test" from_port="performance" to_op="perf test (2)" to_port="performance"/>
          <connect from_op="perf test" from_port="example set" to_op="perf test (2)" to_port="labelled data"/>
          <connect from_op="perf test (2)" from_port="performance" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_port="output 1"/>
          <connect from_op="Performance to Data" from_port="performance vector" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
          <portSpacing port="sink_output 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve scored data" from_port="output" to_op="Loop Parameters" to_port="input 1"/>
      <connect from_op="Loop Parameters" from_port="output 1" to_port="result 1"/>
      <connect from_op="Loop Parameters" from_port="output 2" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

kypexin · May 2018

Hey @mschmitz - could you please elaborate regarding my latest plots / messages in this thread?

This issue seems still not clear to me.

MartinLiebig · June 2018

@kypexin,

ive done some tests. Attached is my project on your data. For me the AUPRC drops heavily, as expected.

100	0.6869897886710802
200	0.540999353555299
300	0.453043673642775
400	0.39372295554318193
500	0.3493142261965152

Where the left coloum is the number of negative examples and the right one is the AUPRC. There is also a way to visualize the AURPC exactly like the operator does it.I think one good question is: How to handle missings in the integral. since i copied most of the code from AUC the handling is the same.

BR,

Martin

kypexin · June 2018

@mschmitz -- please look.

If we take each sample size separately (I did it for the value of 100 for example) and then visualize precision against recall, we can get two meaningful (to my understanding) charts:

Screenshot 2018-06-18 14.03.59.png Precision vs. Recall, as series

Here we see that while recall goes from 0 to 1, all the way precision slowly goes downwards, from 1 to 0.5. Correct?

Screenshot 2018-06-18 14.04.36.png

In a scatter plot, we basically see the same, just from a different perspective.

Now, my question is -- can you please point out what part in this plot exactly counts as an area under curve? If we connect all the points together, we, basically, will get a precision-recall curve, right? So what is the area under it?

PS same plots for sample size = 500

Screenshot 2018-06-18 14.14.28.png Screenshot 2018-06-18 14.14.15.png

Sorry, my brain has started to exhaust smokes already )

MartinLiebig · June 2018

Hey @kypexin,

to be honest i've only adapted our AUC performance measure and copied all of the code I've only changed from TPR/FPR to precision/recall. So the Java code for AUC is fairly similar.

#1 Generate these points

These are the same as for AUC. That's why we can use Extract ROC Curve.

#2

For each point in rocData:

double fpDivN = point.getFalsePositives() / rocData.getTotalNegatives();

double tpDivP = point.getTruePositives() / (point.getTruePositives() + point.getFalsePositives());
if (Double.isNaN(tpDivP)) {
   tpDivP = 0;
}

This is Recall and Precision. Then we do the "summation"

double width = fpDivN - last[0];
double leftHeight = last[1];
double rightHeight = tpDivP;
Double aux = leftHeight * width + (rightHeight - leftHeight) * width / 2;
if (!aux.isNaN()) {
   aucSum += aux;
}

and store the last value:

last = new double[] { fpDivN, tpDivP };

That makes a lot of sense for me..?

Cheers,

Martin

kypexin · June 2018

Well @mschmitz in case of ROC curve it is clear what is the area under it; looking at the visualizations I made for PRC, it is not really clear, because I cannot literally see where and why for sample size 500 AUPRC = 0.35 and this is the problem here Curve with area under it lower than 0.5 would be hanging lower than the diagonal line, isn't it??

MartinLiebig · June 2018

Hi,

there was at least one bug.. For some crazy reason the Recall calculation was for the negative class, while the precision was for the positive class. It's fixed now.

Do you know a good way to check if it's working as expected?

BR,

Martin

kypexin · June 2018

Did you updated the operator itself? I could test it as soon as it is available.

But still, another really important thing to consider in a future is a curve visualization. Because, as we saw, the number itself often does not give much intuition.

MartinLiebig · June 2018

Operator is updated and will be released in the next release of toolbox. I've taken the class' recall..

~Martin

kypexin · June 2018

Thanks Martin! truly appreciate your help.

RNarayan · April 2021

Hello @mschmitz

Can you please advise where I can get hold of the said operator with PRC curve visualisation?

Thanks
Narayan

Telcontar120 · April 2021

You need to install the Operator Toolbox (free extension) and there is an operator in that called Performance (AUPRC)

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

AUPRC with imbalanced classes

Answers