🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS DEADLINE IS NOVEMBER 15   🦉 🎤

CLICK HERE TO GO TO ENTRY FORM

Anomaly extention Generate ROC seems to mirror FP/FN rate

MaartenKMaartenK Member Posts: 10 Contributor I
edited November 2 in Help
I am using the anomaly extention against a artificial dataset. I use three algoritms to assign an anomaly score. These are K-NN Global, uCBLOF and LOF.  My dataset contains a label of the anomalies that are supposed to show up. I use the Generate ROC to measure performance. What GenerateROC does first is choose a treshold for outlier score and add a boolean "prediction". I noticed that in the resulting confusion matrices the FP and FN count are always identical. It seems as if it choose the treshold based on the label to generate the outliers. That seems odd.

The dataset contains 1676 items labeled 'true'. 
Pls see below a historgram of the scores that uses the label as color. As can be seen it fails to assign a high score to the outliers. This is as aspected because our dataset contains global anomalies. Not the Y-axis is logarithmic for readability purposes.

Below that is the resulting confusion matrix from Generate ROC. It contains 1676 FN's which is explainable if you look at the score.
However it also contains 1676 FP's which is suspicious. I looked in the dataset and there are indeed 1676 predictions with the value "true" so it is not a drawing issue. 

I am overlooking something?
 

Answers

  • MaartenKMaartenK Member Posts: 10 Contributor I
    In the mean time I looked a bit further into this problem and reread Goldstein's article on the comparison of anomly detection algorithms. I believe the described behaviour above is intentional. Since there is no clear rule of how to choose a threshold for outlier scores, the Generate ROC component must choose a treshold for each algorithm that allows for comparison of the different algorithms. So i think it will start from the top and stop when the FP/FN rate is symmetrical. 
    sgenzer
Sign In or Register to comment.