The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
Do you want to make a prediction for each of the next ten days, or just for the tenth day?
In the first case you would build a loop with ten iterations, filtering your data accordingly. Essentially, you build a data structure where the value of the selected day is the target variable (label), and you make sure to only use data 10 days before that. For example different averages (7 day, 30 day, year ago, ...) to get different aspects of the data.
The "tenth day prediction" is just a special case of this without the loop.
Note: this is what you have to do if you insist on using SVM. There are multiple more or less automatic time series prediction algorithms that do exactly what you want with a lot less effort.
Regards,
Balázs
I want to make a prediction for each os the next ten days. Can you explain to me how to create a loop in the rapidminer or if there any information about that ?
Regards
if you look at the operators under Utility/Process Control/Loops, you'll see a lot of different ones.
For this use case I would use Loop Values. It takes an example set with the nominal values (these would be your dates in a textual representation). The current value is available as a macro inside the loop, so you can easily select the data according to it.
Regards,
Balázs
Thank you.
filtering the data for building the models happens before you build the model. You then apply the model to today's data.
E. g. if you want a prediction for the 7th day from now, you would filter out data from the last 6 or 7 days (depending on when you get the value for the current day) and build the model from that, with "today" being the target (label). This model can be applied to the unfiltered data up until today and it gives you the prediction for today + 7 days.
The point is to throw away data that you can't know yet for your prediction. You know the history and possibly today's value (maybe only in the afternoon, depending on the use case). You don't know tomorrow or the day after tomorrow, but you'd like to predict a future value. So you build the model from what you *can* know at the time of the model application, and you do that by filtering the past data accordingly.
Regards,
Balázs
Best regards
usually you would use the time series operators to build columns from the data history.
You probably have something like this:
Date | Cases
2021-05-13 | 13
2021-05-14 | 12
...
With the time series operators you can build moving averages over 3, 7, 14, 30 etc. days, or take the value before 10 days etc. You might have a seasonality in the data, in that case you would also care for the values 1 or 2 years before. But probably not with a new disease. And combinations between the values are also interesting to get a trend.
So the modeling datase would be something like this:
Date | Cases date-1 | Cases date-2 | Avg 7 days | Avg 14 days | Avg14 - Avg7 | etc.
You would then use the loop to filter data in a way I described: for the 10 days prediction you would use the most recent data as the label, but all the data that go into the model are filtered 10 days back in time.
Cheers,
Balázs
Best regards,
I described the preprocessing necessary for creating the data structure that you use for modeling and validation. The modeling algorithm is your choice.
Regards,
Balázs
Best regards,
you can set breakpoints (after or before execution) on operators to see what goes into them and what comes out of them. That way you can easily see where you lose data.
Regards,
Balázs