The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Options

# AUC calculation

Legacy User
Member Posts:

**0**Newbie
Hi,

i just wanted to ask why the implementors of the AUC calculation at first considered the trapeze and changed the implementation such that the trapeze is not considered any more. See below:

i just wanted to ask why the implementors of the AUC calculation at first considered the trapeze and changed the implementation such that the trapeze is not considered any more. See below:

Benedikt

package com.rapidminer.tools.math;

...

public class ROCDataGenerator implements Serializable {

...

public double calculateAUC(ROCData rocData) {

...

// if (last != null) {

// aucSum += ((tpDivP - last[1]) * (fpDivN - last[0]) / 2.0d) + (last[1] * (fpDivN - last[0]));

// }

// only rectangle

if (last != null) {

aucSum += last[1] * (fpDivN - last[0]);

}

...

}

...

}

Tagged:

0

## Answers

1,751RM Foundernot using the trapeze calculation delivers a more pessimistic error estimation and is more often used by other statistical software packages (at least as far as we are aware of). It was planned to add an option (or a second criterion, e.g. "AUC_trapez") and hence the code fragment stayed as a commented block. Probably this second option will be available in some future version.

Cheers,

Ingo

0Newbiethanks for the answer.

Benedikt

347MavenToday I stumbled upon this rather strange result:

Input Data: As you can see, the ranking is perfect. However, the resulting AUC was 0.5 (process and data files are added below)

As you may already know, a genius has once proven that the AUC is the probability that an example of the positive class is ranked higher than an example of the negative class. I know that this is correct for the trapez formula, but I am not sure if this also true for your formula.

kind regards,

Steffen

PS: using rm 4.4 release

Process: Dat-File AML-File

347MavenI verify that this behaviour has not changed in rm 4.5

439MavenIn fact, the computation was not exact since the very first data point was incorrectly dropped.

Thanks for pointing this out again.

Best,

Simon