Options

Find dependencies in multivariate timeseries

OzoneOzone Member Posts: 17 Contributor II
Hello,

its my first time using this community. My problem is:
I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)
Now I have one label attribute (observations) which is forecasted by a deterministic physical model.

The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.

I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".

Maybe you have some similar problems or ideas for my problem.

Thanks a lot for your help. Maybe you can recommend some operators for this problem.

Thomas

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Thomas,

    well, "some operators"? I could recommend the complete RapidMiner package since it is also designed for tasks like yours. I cannot do my consulting work for free here but at least some hints:

    - in general I would ask myself: is the physical model a ground truth? If yes, I don't get why there are "outliers" at all;

    - if it's deterministic, the outliers are not really outliers but, well, let's call them "unexpected";

    - in principle, your basic approach is correct but there are two way: mark the outliers as outliers and make a classification task out of it or just model your label (without any overfitting!) and check for derivations. You could re-model those derivations if necessary to get insight in the reasons.

    Cheers,
    Ingo
  • Options
    wesselwessel Member Posts: 537 Maven
    I don't understand the task.

    Is it anomaly detection?
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Could be. But it is also possible that it's just about having a theoretical model together with measurements and the task is to refine the model. This could only be answered by the original author. Let's see if he's still here...

    Cheers,
    Ingo
  • Options
    OzoneOzone Member Posts: 17 Contributor II
    Hey Ingo,

    thanks for your help. I was absent a long time but I worked on my problem!

    The prediction of the physical model depends again on the combination of other variables. It is no statistical model and it is not fitted to observations ("truth"). So I tried to make a difference between good and bad predictions and then tried a lot of classification operators again. It works very well and the new Automatic Classification System helps a lot!

    Thanks
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Thomas,

    those are great news! Glad to hear that things turned out well.

    Cheers,
    Ingo
  • Options
    Arturo_LomasArturo_Lomas Member Posts: 1 Contributor I
    Hello Ozone
    You can tell me what was the set of transactions (operators) used in RapidMiner to get your automatic classification system?
    Which meteorological variables used for classification?
    Sincerely, Arturo
Sign In or Register to comment.