The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Options

# AUC > 1?

Dear All,

How come the ROC can get above 1?

http://img.ctrlv.in/img/51d099898f5d9.jpg

Best regards,

Wessel

How come the ROC can get above 1?

http://img.ctrlv.in/img/51d099898f5d9.jpg

Best regards,

Wessel

0

## Answers

106MavenIndeed, AUC (the area under the red ROC curve) cannot be more than 1 (in fact the curve itself cannot go above the horizontal line y=1; also the reddish area which may indicate confidence intervals, or some other indicative variation, shouldn't go above that horizontal line).

By the way, I have just checked again if another error regarding the calculation of AUC that I had reported a couple of years ago http://rapid-i.com/rapidforum/index.php/topic,2237.0.html was corrected, and it seems it was not - perhaps the reported error was not well understood by the guys at RapidI or other participants in that thread. The image below shows that the area under the (red) ROC curve that is clearly 1 is still wrongly calculated by RM as AUC=0.5.

see image: http://postimg.org/image/9upjmo2ev/

People can try the following simple process building a perfect classifier (that is, having the accuracy=1) that illustrates the bug. Always the AUC (here 0.5!!) should be a value between the pessimistic AUC (here 1) and the optimistic AUC (here 1). This is so because always the ROC curve is placed between the pessimistic ROC and the optimistic ROC curves. In the particular case of this classifier built below, all the 3 ROC curves are identical (check the process's result), so the 3 areas under the curves should be equal, and they are not.

Dan

1,869UnicornThe light red area is not a confidence band, but the standard deviation of each data point based on the 10 iterations of the X-Validation. Of course, the actual value +/- the standard deviation can exceed 1/0.

Dan, as Ingo already posted in the old thread, the calculation of the AUC is not wrong. In the standard implementation (neither optimistic nor pessimistic), we smooth the line by interpolating between the steps of the function. If you have more than 2 confidence levels this works quite well. In this border case the results is admittedly a bit strange, but nevertheless correct. In case of more need of discussion please let's continue in the respective thread at http://rapid-i.com/rapidforum/index.php/topic,2237.0.html

Best regards,

Marius

537MavenThanks a lot for your information.

Now I understand why it shows a red spike above 1.

Its simply because the first part of the ROC has a large variation.

Therefore mean + standard variation is almost always above 1.

As a possible variation, you could plot all 10 ROC iterations, and plot a fat line in the middle for average(ROC).

This maybe be a more faithful display of the ROC distribution.

Best regards,

Wessel

1,869UnicornBtw, with your next post you will enter the honorable circle of Hero Members. Congratulations!

~Marius

106MavenIf this does not convince you, here is a second intuitive rationale. The AUC is one of the indicators of a model's performance. A model that randomly guesses the class has an AUC of about 0.5. In contrast a model that always predicts the correct class should achieve a much better performance (that is, a higher AUC in this case precisely) than a random guesser, shouldn't it? Such a perfect model is built by the process above, yet according to RM it is as good as a random guesser if performance is measured by AUC. This is an anomaly, and this anomaly is due to the wrong RM's calculation of AUC. Consult (***) below for a reference.

Finally, look at the ROC your software draws in the process I provided: the area under that curve is 1x1=1 indeed, as you have there a rectangle and not a triangle! The drawing is correct, and is inconsistent with the calculation which is clearly wrong.

Dan

(***) Reference: Tan, Steinbach, Kumar, Introduction to Data Mining, Addison Wesley, 2005

Subsection 5.7.2 on ROC: " The area under the ROC curve (AUC) provides another approach for evaluating which model is better on average. If the model is perfect, then its area under the ROC curve would equal 1. If the model simply performs random guessing, then its area under the ROC curve would equal 0.5"

1,869UnicornBest regards,

Marius

849MavenMarius, I think Nils has looked into this...

http://rapid-i.com/rapidforum/index.php/topic,4348.msg15895.html#msg15895