# Find dependencies in multivariate timeseries

Hello,

its my first time using this community. My problem is:

I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)

Now I have one label attribute (observations) which is forecasted by a deterministic physical model.

The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.

I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".

Maybe you have some similar problems or ideas for my problem.

Thanks a lot for your help. Maybe you can recommend some operators for this problem.

Thomas

its my first time using this community. My problem is:

I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)

Now I have one label attribute (observations) which is forecasted by a deterministic physical model.

The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.

I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".

Maybe you have some similar problems or ideas for my problem.

Thanks a lot for your help. Maybe you can recommend some operators for this problem.

Thomas

0

## Answers

1,751RM Founderwell, "some operators"? I could recommend the complete RapidMiner package since it is also designed for tasks like yours. I cannot do my consulting work for free here but at least some hints:

- in general I would ask myself: is the physical model a ground truth? If yes, I don't get why there are "outliers" at all;

- if it's deterministic, the outliers are not really outliers but, well, let's call them "unexpected";

- in principle, your basic approach is correct but there are two way: mark the outliers as outliers and make a classification task out of it or just model your label (without any overfitting!) and check for derivations. You could re-model those derivations if necessary to get insight in the reasons.

Cheers,

Ingo

537MavenIs it anomaly detection?

1,751RM FounderCheers,

Ingo

17Contributor IIthanks for your help. I was absent a long time but I worked on my problem!

The prediction of the physical model depends again on the combination of other variables. It is no statistical model and it is not fitted to observations ("truth"). So I tried to make a difference between good and bad predictions and then tried a lot of classification operators again. It works very well and the new Automatic Classification System helps a lot!

Thanks

1,751RM Founderthose are great news! Glad to hear that things turned out well.

Cheers,

Ingo

1Contributor IYou can tell me what was the set of transactions (operators) used in RapidMiner to get your automatic classification system?

Which meteorological variables used for classification?

Sincerely, Arturo