The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
AUC > 1?
Dear All,
How come the ROC can get above 1?
http://img.ctrlv.in/img/51d099898f5d9.jpg
Best regards,
Wessel
How come the ROC can get above 1?
http://img.ctrlv.in/img/51d099898f5d9.jpg
Best regards,
Wessel
0
Answers
Indeed, AUC (the area under the red ROC curve) cannot be more than 1 (in fact the curve itself cannot go above the horizontal line y=1; also the reddish area which may indicate confidence intervals, or some other indicative variation, shouldn't go above that horizontal line).
By the way, I have just checked again if another error regarding the calculation of AUC that I had reported a couple of years ago http://rapid-i.com/rapidforum/index.php/topic,2237.0.html was corrected, and it seems it was not - perhaps the reported error was not well understood by the guys at RapidI or other participants in that thread. The image below shows that the area under the (red) ROC curve that is clearly 1 is still wrongly calculated by RM as AUC=0.5.
see image: http://postimg.org/image/9upjmo2ev/
People can try the following simple process building a perfect classifier (that is, having the accuracy=1) that illustrates the bug. Always the AUC (here 0.5!!) should be a value between the pessimistic AUC (here 1) and the optimistic AUC (here 1). This is so because always the ROC curve is placed between the pessimistic ROC and the optimistic ROC curves. In the particular case of this classifier built below, all the 3 ROC curves are identical (check the process's result), so the 3 areas under the curves should be equal, and they are not.
Dan
The light red area is not a confidence band, but the standard deviation of each data point based on the 10 iterations of the X-Validation. Of course, the actual value +/- the standard deviation can exceed 1/0.
Dan, as Ingo already posted in the old thread, the calculation of the AUC is not wrong. In the standard implementation (neither optimistic nor pessimistic), we smooth the line by interpolating between the steps of the function. If you have more than 2 confidence levels this works quite well. In this border case the results is admittedly a bit strange, but nevertheless correct. In case of more need of discussion please let's continue in the respective thread at http://rapid-i.com/rapidforum/index.php/topic,2237.0.html
Best regards,
Marius
Thanks a lot for your information.
Now I understand why it shows a red spike above 1.
Its simply because the first part of the ROC has a large variation.
Therefore mean + standard variation is almost always above 1.
As a possible variation, you could plot all 10 ROC iterations, and plot a fat line in the middle for average(ROC).
This maybe be a more faithful display of the ROC distribution.
Best regards,
Wessel
Btw, with your next post you will enter the honorable circle of Hero Members. Congratulations!
~Marius
If this does not convince you, here is a second intuitive rationale. The AUC is one of the indicators of a model's performance. A model that randomly guesses the class has an AUC of about 0.5. In contrast a model that always predicts the correct class should achieve a much better performance (that is, a higher AUC in this case precisely) than a random guesser, shouldn't it? Such a perfect model is built by the process above, yet according to RM it is as good as a random guesser if performance is measured by AUC. This is an anomaly, and this anomaly is due to the wrong RM's calculation of AUC. Consult (***) below for a reference.
Finally, look at the ROC your software draws in the process I provided: the area under that curve is 1x1=1 indeed, as you have there a rectangle and not a triangle! The drawing is correct, and is inconsistent with the calculation which is clearly wrong.
Dan
(***) Reference: Tan, Steinbach, Kumar, Introduction to Data Mining, Addison Wesley, 2005
Subsection 5.7.2 on ROC: " The area under the ROC curve (AUC) provides another approach for evaluating which model is better on average. If the model is perfect, then its area under the ROC curve would equal 1. If the model simply performs random guessing, then its area under the ROC curve would equal 0.5"
Best regards,
Marius
Marius, I think Nils has looked into this...
http://rapid-i.com/rapidforum/index.php/topic,4348.msg15895.html#msg15895