The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Anomaly extention Generate ROC seems to mirror FP/FN rate
I am using the anomaly extention against a artificial dataset. I use three algoritms to assign an anomaly score. These are K-NN Global, uCBLOF and LOF. My dataset contains a label of the anomalies that are supposed to show up. I use the Generate ROC to measure performance. What GenerateROC does first is choose a treshold for outlier score and add a boolean "prediction". I noticed that in the resulting confusion matrices the FP and FN count are always identical. It seems as if it choose the treshold based on the label to generate the outliers. That seems odd.
The dataset contains 1676 items labeled 'true'.
Pls see below a historgram of the scores that uses the label as color. As can be seen it fails to assign a high score to the outliers. This is as aspected because our dataset contains global anomalies. Not the Y-axis is logarithmic for readability purposes.
Below that is the resulting confusion matrix from Generate ROC. It contains 1676 FN's which is explainable if you look at the score.
However it also contains 1676 FP's which is suspicious. I looked in the dataset and there are indeed 1676 predictions with the value "true" so it is not a drawing issue.
I am overlooking something?
The dataset contains 1676 items labeled 'true'.
Pls see below a historgram of the scores that uses the label as color. As can be seen it fails to assign a high score to the outliers. This is as aspected because our dataset contains global anomalies. Not the Y-axis is logarithmic for readability purposes.
Below that is the resulting confusion matrix from Generate ROC. It contains 1676 FN's which is explainable if you look at the score.
However it also contains 1676 FP's which is suspicious. I looked in the dataset and there are indeed 1676 predictions with the value "true" so it is not a drawing issue.
I am overlooking something?
Tagged:
0
Best Answer
-
MaartenK Member Posts: 17 Contributor III had contact with Markus Golstein. The behaviour is as designed. The Generte ROC component will try every treshold from top to bottom on all instances sorted by score. It will choose an optimum as treshold, to allow for a fair comparison of different algorithms.1
Answers