Bug when running ANOVA
Hello,
i want to perform a comparison of forecasting performances for 5 methods: ARIMA, Generalized Linear Model, Linear Regression, Support Vector Machines and Neural Networks.
I use a sample dataset of Apple (AAPL) containing 137 consequtive trading days.
Since the ARIMA model is evaluated using the AIC score and i need all of them to be evaluated using Root Mean Squared Error (RMSE), i apply the ARIMA trainer of the first 127 days and then i ask the ARIMA forecast to predict over an horizon of 10 days, comparing the actual price with the forecast and calculating the RMSE.
For the other methods its much easier, since i train each model on the 127 days and apply the model on the last 10 days.
Note that i am using Grid search to find optimal parameters for all methods. The process runs perfectly but when i add the ANOVA operator to compare the performances, an error pop up. When i disable the ANOVA and have only T-test, everyting runs smoothly.
I attach the process (if you want to run this you have to disable the ANOVA operator!), the error and the data sample in a zip file.
Kind regards,
mmarag
Comments
Hi @mmarag,
I have tried removing the range filtering, because you don't really need it (you get your performance measures with cross validation). Additionally, I see no way to compare ARIMA with the others.
Here is my process:
There is no significant difference between the groups, which may be understandable given the short window (intercept or the default node has a big weight). Maybe you need to generate more complex features in order to better fit the training data (larger windows, differentiation). The dataset itself is also very small.
For this kind of reduced data set time series models can be good. You can focus on comparing the predictions and their confidence intervals.
Best regards,
Sebastian
Hello and thanks for the input.
Actually, the cross validation is performed on the train data (i.e. the 127 days) and is used to help the optimizer choose the best parameters. I need to filter the data because i am interested on the performance for only the last 10 days, i want to ensure that the models would meet such data for the first time and haven't seen them before.
As regards to the training data size, i have also tried it with the copper dataset (its a ready made sample in the time series extension) and it works (apparts from the ANOVA operator which crashes)
with regards
Hi,
I tested with the copper dataset and it works if I use the whole sample size. If I take a look at the ANOVA table, I can see that for both cases N = 40. That means that the ANOVA operator seem to get a subset of the results (10 each) and it fails if there are less than 10 samples in each group. That doesn't seem to be correct, therefore it must be a bug.
I attach @sgenzer to be begin the bug-reporting process, thank you!
thank you very much for your effort on this
Thanks @SGolbert for pinging me.
@mmarag so I brought in this process and your Excel file and get no error whatsoever. Can you please help me reproduce?
Scott
hello @sgenzer,
when the T-test operator is connected to the ANOVA the error that pops out is this:
and the error description is
ok thanks. Just so I can replicate it exactly, can you please send me a new XML?