AUC calculation

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2019 in Help
Hi,

i just wanted to ask why the implementors of the AUC calculation at first considered the trapeze and changed the implementation such that the trapeze is not considered any more. See below:

package com.rapidminer.tools.math;
...
public class ROCDataGenerator implements Serializable {
...
    public double calculateAUC(ROCData rocData) {
          ...
// if (last != null) {
// aucSum += ((tpDivP - last[1]) * (fpDivN - last[0]) / 2.0d) + (last[1] * (fpDivN - last[0]));
// }

// only rectangle
        if (last != null) {
            aucSum += last[1] * (fpDivN - last[0]);
        }
    ...
    }
...
}
Benedikt
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello Benedikt,

    not using the trapeze calculation delivers a more pessimistic error estimation and is more often used by other statistical software packages (at least as far as we are aware of). It was planned to add an option (or a second criterion, e.g. "AUC_trapez") and hence the code fragment stayed as a commented block. Probably this second option will be available in some future version.

    Cheers,
    Ingo
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Hi Ingo,

    thanks for the answer.

    Benedikt
  • steffensteffen Member Posts: 347 Maven
    Revitalizing an old discussion...

    Today I stumbled upon this rather strange result:
    Input Data:

    id confidence(1)     label prediction(label)
    1.0 0.25 0 0
    2.0 1.0 1 1
    3.0 1.0 1 1
    4.0 1.0 1 1
    As you can see, the ranking is perfect. However, the resulting AUC was 0.5  (process and data files are added below)

    As you may already know, a genius has once proven that the AUC is the probability that an example of the positive class is ranked higher than an example of the negative class. I know that this is correct for the trapez formula, but I am not sure if this also true for your formula.

    kind  regards,

    Steffen

    PS: using rm 4.4 release

    Process:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Dokumente und Einstellungen\Steffen.Springer\Desktop\buggy\demoset.aml"/>
        </operator>
        <operator name="BinominalClassificationPerformance" class="BinominalClassificationPerformance">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="AUC" value="true"/>
        </operator>
    </operator>
    Dat-File

    1.0, 0.25, "0", "0"
    2.0, 1.0, "1", "1"
    3.0, 1.0, "1", "1"
    4.0, 1.0, "1", "1"
    AML-File

    <?xml version="1.0" encoding="windows-1252"?>
    <attributeset default_source="demoset.dat">
      <id
        name      = "id"
        sourcecol  = "1"
        valuetype  = "integer"/>

      <confidence_1
        name      = "confidence(1)"
        sourcecol  = "2"
        valuetype  = "real"/>

      <label
        name      = "label"
        sourcecol  = "3"
        valuetype  = "binominal">
        <value>0</value>
        <value>1</value>
      </label>

      <prediction
        name      = "prediction(label)"
        sourcecol  = "4"
        valuetype  = "binominal">
        <value>0</value>
        <value>1</value>
      </prediction>

    </attributeset>


  • steffensteffen Member Posts: 347 Maven
    I am afraid this message has been pushed down by other threads so fast so that it may have not been noticed by the local authorities :). I do not ask for much , a simple "recognized" is enough.

    I verify that this behaviour has not changed in rm 4.5
  • fischerfischer Member Posts: 439 Maven
    Recognized. :-)

    In fact, the computation was not exact since the very first data point was incorrectly dropped.

    Thanks for pointing this out again.

    Best,
    Simon
Sign In or Register to comment.