Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Newbie: help with unsupervised anomaly detection with RapidMiner"
Hello,
After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection
I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.
My problem... (with some simplifications):
I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".
I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...
Then, I have a data for the temperature of the last one hour, reported by a minute.
My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".
At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").
Thanks for any hint,
Max
After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection
I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.
My problem... (with some simplifications):
I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".
I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...
Then, I have a data for the temperature of the last one hour, reported by a minute.
My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".
At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").
Thanks for any hint,
Max
Tagged:
0
Best Answer
-
MariusHelf RapidMiner Certified Expert, Member Posts: 1,869 UnicornHi Max,
you should have a look at the Outlier operators, especially Outlier Detection (LOF). It calculates the Local Outlier Factor for each example, a numeric measure where high values indicate a higher probability for the example of being an outlier.
You can manually create a label which is true for all values above a certain threshold, and false otherwise. If you then create a descriptive model, e.g. a decision tree, which classifies the examples into true or false, you will know why the respective examples are outliers.
Best regards,
Marius6
Answers
Max