Find dependencies in multivariate timeseries

Ozone · May 2010

Hello,

its my first time using this community. My problem is:
I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)
Now I have one label attribute (observations) which is forecasted by a deterministic physical model.

The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.

I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".

Maybe you have some similar problems or ideas for my problem.

Thanks a lot for your help. Maybe you can recommend some operators for this problem.

Thomas

IngoRM · May 2010

Hi Thomas,

well, "some operators"? I could recommend the complete RapidMiner package since it is also designed for tasks like yours. I cannot do my consulting work for free here but at least some hints:

- in general I would ask myself: is the physical model a ground truth? If yes, I don't get why there are "outliers" at all;

- if it's deterministic, the outliers are not really outliers but, well, let's call them "unexpected";

- in principle, your basic approach is correct but there are two way: mark the outliers as outliers and make a classification task out of it or just model your label (without any overfitting!) and check for derivations. You could re-model those derivations if necessary to get insight in the reasons.

Cheers,
Ingo

wessel · May 2010

I don't understand the task.

Is it anomaly detection?

IngoRM · May 2010

Could be. But it is also possible that it's just about having a theoretical model together with measurements and the task is to refine the model. This could only be answered by the original author. Let's see if he's still here...

Cheers,
Ingo

Ozone · November 2010

Hey Ingo,

thanks for your help. I was absent a long time but I worked on my problem!

The prediction of the physical model depends again on the combination of other variables. It is no statistical model and it is not fitted to observations ("truth"). So I tried to make a difference between good and bad predictions and then tried a lot of classification operators again. It works very well and the new Automatic Classification System helps a lot!

Thanks

IngoRM · November 2010

Hi Thomas,

those are great news! Glad to hear that things turned out well.

Cheers,
Ingo

Arturo_Lomas · September 2011

Hello Ozone
You can tell me what was the set of transactions (operators) used in RapidMiner to get your automatic classification system?
Which meteorological variables used for classification?
Sincerely, Arturo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Find dependencies in multivariate timeseries

Answers