prediction with svm

mines · May 2021

Does anyone know how to make a prediction for the next ten days with the svm algorithm in rapidminer?

BalazsBarany · May 2021

Hi!

Do you want to make a prediction for each of the next ten days, or just for the tenth day?

In the first case you would build a loop with ten iterations, filtering your data accordingly. Essentially, you build a data structure where the value of the selected day is the target variable (label), and you make sure to only use data 10 days before that. For example different averages (7 day, 30 day, year ago, ...) to get different aspects of the data.

The "tenth day prediction" is just a special case of this without the loop.

Note: this is what you have to do if you insist on using SVM. There are multiple more or less automatic time series prediction algorithms that do exactly what you want with a lot less effort.

Regards,
Balázs

mines · May 2021

Hello @BalazsBarany!
I want to make a prediction for each os the next ten days. Can you explain to me how to create a loop in the rapidminer or if there any information about that ?
Regards

BalazsBarany · May 2021

Hi,

if you look at the operators under Utility/Process Control/Loops, you'll see a lot of different ones.
For this use case I would use Loop Values. It takes an example set with the nominal values (these would be your dates in a textual representation). The current value is available as a macro inside the loop, so you can easily select the data according to it.

Regards,
Balázs

mines · May 2021

@BalazsBarany But i should use that after apply a model or should do that in cross validation?
Thank you.

BalazsBarany · May 2021

Hi,

filtering the data for building the models happens before you build the model. You then apply the model to today's data.

E. g. if you want a prediction for the 7th day from now, you would filter out data from the last 6 or 7 days (depending on when you get the value for the current day) and build the model from that, with "today" being the target (label). This model can be applied to the unfiltered data up until today and it gives you the prediction for today + 7 days.

The point is to throw away data that you can't know yet for your prediction. You know the history and possibly today's value (maybe only in the afternoon, depending on the use case). You don't know tomorrow or the day after tomorrow, but you'd like to predict a future value. So you build the model from what you *can* know at the time of the model application, and you do that by filtering the past data accordingly.

Regards,

Balázs

mines · May 2021

@BalazsBarany thank you for your help. But i use a loop value and i should use the column data (which have all my dates) or choose the column that i want to predict? Because my goal is to make a prediction with svm algorithms and i want to predict de number of cases in a disease for the next 10 days.
Best regards

BalazsBarany · May 2021

Hi,

usually you would use the time series operators to build columns from the data history.

You probably have something like this:

Date | Cases
2021-05-13 | 13
2021-05-14 | 12
...

With the time series operators you can build moving averages over 3, 7, 14, 30 etc. days, or take the value before 10 days etc. You might have a seasonality in the data, in that case you would also care for the values 1 or 2 years before. But probably not with a new disease. And combinations between the values are also interesting to get a trend.

So the modeling datase would be something like this:

Date | Cases date-1 | Cases date-2 | Avg 7 days | Avg 14 days | Avg14 - Avg7 | etc.

You would then use the loop to filter data in a way I described: for the 10 days prediction you would use the most recent data as the label, but all the data that go into the model are filtered 10 days back in time.

Cheers,
Balázs

mines · May 2021

But need to use svm algorithm, i can use both to the prediction?
Best regards,

BalazsBarany · May 2021

Yes, SVM works well with a large number of attributes.

I described the preprocessing necessary for creating the data structure that you use for modeling and validation. The modeling algorithm is your choice.

Regards,
Balázs

mines · May 2021

I build a model and use the optimize grid and again apply a model and my dataset have 136 rows and in final output lost various data. But I don't understand why, can you help me @BalazsBarany?
Best regards,

BalazsBarany · May 2021

Hi,

you can set breakpoints (after or before execution) on operators to see what goes into them and what comes out of them. That way you can easily see where you lose data.

Regards,
Balázs

timothy_rij · February 2023

@mines, did you end up getting this to work? I am trying to do something similar but there are no tutorial videos on using loops or setting up a similar process.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

prediction with svm

Answers