Anomaly detection using Deep Learning Operator in Rapid Miner

LobbieLobbie Member Posts: 10 Contributor I
edited December 2018 in Help

Hi,

 

I have a 10000 example set with highly imbalanced class i.e. Success_Flag = 0 (9800 records) & Success_Flag = 1 (200 records).  I tried the One Class SVM from the SVM (Libsvm) operator with cross validations and optimisation grid to identify the anomalies and the result is ok.  I was hoping to do the same using the Deep Learning Operator but unfortunately, it does not take a One Class classification problem.  I then thought of "tricking" the Deep Learning Operator by,

 

1.  Separate the example set into 2 sets based on their classes i.e. 9800 class 0 and 200 class 1.

2.  For the 9800 class 0, I partitioned into 7840 observations (80%) and 1960 observations (20%)

3.  From the 200 class 1, I random sample with replacement for 1 record.

4.  I then append this 1 record (class 1) to the 7840 (80%) = 7841 observations for training (now there are 2 classes)

5.  I then append the 1960 (20%) to the 200 (class 1) = 2160 observations for testing (now also there are 2 classes).

6.  I train a Deep Learning classifier with 10 fold CV using the 7481 training observations within an optimisation grid.

7.  The optimised hyperparameters for the deep learning classifier is then tested on the 2160 test set.

 

The test results looked to be much better than the One Class SVM results.  Hence my questions are,

 

A.  Is my "trick" correct from a data mining best practice perspective?

B.  Are there any potential problems with what I did?

C.  If my "trick" is incorrect, what are your recommendations for doing anomaly detection with deep learning without resorting integrating R or Python scripts?

 

Thanks,

Lobbie

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

     

    since @Ingo 's Hyper Hyper is sound i would argue that this algorithm is fine as well..

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    it sounds like you have the same 200 examples in train and test. You cannot trust the performance if you have an overlap.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • LobbieLobbie Member Posts: 10 Contributor I

    Hi Martin,

     

    No and as I explained, only 1 random sampled with replacement record is included in the training.  The rest of the test records have all of the 200.  Please see Points 4 and 5,

     

    4.  I then append this 1 record (class 1) to the 7840 (80% of class 0) = 7841 observations for training (now there are 2 classes)

    5.  I then append the 1960 (20% of class 0) to the 200 (class 1) = 2160 observations for testing (now also there are 2 classes).

     

    Here's how it looks graphically,

    2017-09-19_8-13-57.jpg

     

     

    Regards,

    Lobbie

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Oh,

     

    sorry. In this case, it should be valid. Interesting that it works though... @IngoRM @RalfKlinkenberg do i miss something? 

     

    Best.

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • LobbieLobbie Member Posts: 10 Contributor I

    Hi Martin,

     

    Thanks and it did work.  In OCSVM, I got about 11% precision in predicting class 1 as anomaly and with the "trick" using Deep Learning operator, I got about 19% precision in predicting class 1.

     

    Even though there is 1 observation of class 1 in training, my hunch is that it is too small to have statistically significant weight on training the model mainly with class 0.  This overcomes the limitation of the deep learning operator only accepting >= 2 class labels.  Obviously, this will overfit the trained model and that is the point because when I test the model with the test dataset containing 200 class 1, the model should be able to identify predict anomaly accordingly.

     

    My concerns though are is this statistically sound and not violate any good data mining practices.  If yes, then I would appreciate your guidance on how can I do anomaly detection using deep learning operator without resorting to R or Python scripts.

     

    Thanks,

    Lobbie

  • LobbieLobbie Member Posts: 10 Contributor I

    Hi Martin,

     

    Just wondering if you have any thoughts if the trick I mentioned above is statistically sound?

     

    Thanks,

    Lobbie

     

Sign In or Register to comment.