Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Outlier detection (in real-time)

Chris_J_Chris_J_ Member Posts: 1 Learner III
Hello RapidMiner-Community,

I'm quite new to the topic of machine learning algorithm. I usually detect outliers simply by calculating a mean value and then use a 2 or 3-fold standard deviation to decide if a data instance is normal or an outlier.

Now, I'd like to add and test some other attributes to detect outliers and so the mentioned method above doesn't do well anymore. The condition is that the algorithms can detect outliers in real time (or with a short delay). The model itself can be calculated during the day.

As I read in Chandolas paper "Anomaly detection: A survey" classification algorithm like one-class SVM could be used, because the testing phase is fast after the model is trained. Also clustering mechanism may be possible because each new instance has to be tested against a few clusters.

Now, I would be very thankful if you could give me some technical advice what algorithm in RapidMiner could actually work to detect outliers in real time, especially with spatio-temporal datasets.

Thanks for your help!

Greetings,

Chis

Answers

  • rakirkrakirk Member Posts: 29 Contributor II
    So an idea that I have used in the past was to use the k-nearest neighbors algorithm to assemble a sort of 'state space' of features that cluster together. Then, in real time, the algorithm could look for proximity of existing elements given what it knows about the features of the new event.

    Let me know how it turns out,

    rk
  • ThomasMThomasM Member Posts: 3 Contributor I
    Hi,

    Did you try Cobweb clustering ? It is sensitive to examples' ordering, thus taking into account time parameter.

    Thomas.
  • ThomasMThomasM Member Posts: 3 Contributor I
    I would add a notice to what I have just said : "state space" means for me "Kalman" ; thus, Does RapidMiner have an implementation of Kalman Filtering ?

    Thomas.
  • arunpushkararunpushkar Member Posts: 8 Contributor II
    Any body knows how to use anomaly detection operators in rapid miner an can some body explain how to use LOF please.
Sign In or Register to comment.