The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
could you post the process generating this data here?
Greetings,
Sebastian
Regards
Dan
I understand that this seems hard to believe but as far as I can see the calculation is indeed correct:
- if you only have the reference points (0,0) and (1,1) the trapecoidal calculation of the AUC will deliver exactly the half of rectangle which results in 0.5
- the optimisting calculation is also easy to understand: here the upper bounds for each rectangle are used and this results in 1.0
- the one thing which might surprise why the pessimistic calculation also results in 1 and hence is better than 0.5: but here the lower rectangles are used - which in this case is exactly the same rectangle like in the optimistic case
Cheers,Ingo
Thanks for the explanation. Actually the ROC curve in this case contains the point (0,1), the so called "perfect classification" - see http://en.wikipedia.org/wiki/Receiver_operating_characteristic
So you have the points (0,1) and (1,1) in the curve graph.
You can see also the drawing of the ROC produced by RM in this case: indeed the area under this curve is 1. Therefore AUC indicator should be calculated to 1.
Moreover, please note that an AUC of 0.5 is achieved in general by the random classifiers (which provide for instance an equal number of good and bad answers - assuming we have the positive and negative classes of the same size). This is improper for the particular decision tree I provided - which happens to be a perfect classifier (accuracy=1).
Also, it is widely accepted that AUC is one of the indicators of the quality of a binary classifier. As said, the above decision tree is a perfect classifier, so it is natural it to have the highest AUC as opposed to an AUC=0.5.
So everything indicates that AUC should be calculated to be 1 here. This would be consistent also with the optimistic and pessimistic calculations.
Best,
Dan
Cheers,
Ingo
Given that the author of the above code has recently posted thus.. here...
http://rapid-i.com/rapidforum/index.php/topic,2584.msg10537.html#msg10537
I took another look at the code and noticed that we have binominal mapping/remapping of the label, without view creation, which changes the underlying data, generates an error, and is not necessary for the learner, like this... Disable these operators, and the warnings disappear, and rather a different result emerges , like this...
Or am I missing something?
Toodle Pip!
Just checked and this error of RM in calculating AUC has not been corrected since this was posted.
Here is a recall. http://rapid-i.com/rapidforum/index.php/topic,6871.msg24166.html#msg24166
As one of the participants at this discussion asked - yes, perhaps understanding the essential thing. RM still makes this AUC calculation error 2 years after. Toodle Pip.
Dan
PS By the way AUC is the area under the ROC curve. As reported to RapidI team some time ago, RM produces some wrong results within the ROC analysis too.