The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Find dependencies in multivariate timeseries
Hello,
its my first time using this community. My problem is:
I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)
Now I have one label attribute (observations) which is forecasted by a deterministic physical model.
The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.
I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".
Maybe you have some similar problems or ideas for my problem.
Thanks a lot for your help. Maybe you can recommend some operators for this problem.
Thomas
its my first time using this community. My problem is:
I have time series data with various attributes which are correlated with each other very different. (Timeseries of weather data)
Now I have one label attribute (observations) which is forecasted by a deterministic physical model.
The task is to identify outliers and to determine in which situations the physical model is bad and why. Maybe some of the other attributes are forecasted very bad and so the label attribute is, too.
I tried some models (linear regression,neural nets, svm, decision trees, naive bayes) to predict these outliers. I got some good performance but I dont know how to interpret these results. Maybe this problem is to complex but the goal is to clearly identify the reason for a specific outlier. At least I want to make some qualitative statements like "when wind comes from north, the probability for outliers is higher than when wind comes from south".
Maybe you have some similar problems or ideas for my problem.
Thanks a lot for your help. Maybe you can recommend some operators for this problem.
Thomas
0
Answers
well, "some operators"? I could recommend the complete RapidMiner package since it is also designed for tasks like yours. I cannot do my consulting work for free here but at least some hints:
- in general I would ask myself: is the physical model a ground truth? If yes, I don't get why there are "outliers" at all;
- if it's deterministic, the outliers are not really outliers but, well, let's call them "unexpected";
- in principle, your basic approach is correct but there are two way: mark the outliers as outliers and make a classification task out of it or just model your label (without any overfitting!) and check for derivations. You could re-model those derivations if necessary to get insight in the reasons.
Cheers,
Ingo
Is it anomaly detection?
Cheers,
Ingo
thanks for your help. I was absent a long time but I worked on my problem!
The prediction of the physical model depends again on the combination of other variables. It is no statistical model and it is not fitted to observations ("truth"). So I tried to make a difference between good and bad predictions and then tried a lot of classification operators again. It works very well and the new Automatic Classification System helps a lot!
Thanks
those are great news! Glad to hear that things turned out well.
Cheers,
Ingo
You can tell me what was the set of transactions (operators) used in RapidMiner to get your automatic classification system?
Which meteorological variables used for classification?
Sincerely, Arturo