The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Prediction (Forecasting ) with RM
IngoRM
Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Original message from SourceForge forum at http://sourceforge.net/forum/forum.php?thread_id=2019003&;forum_id=390413
Hi,
I have some electricity data for past 1 year.Using this data i want to predict data for next 1 month.Is it possible with RM.
can somebody help me in this regard.
Thanks in advance
Thanks,
Swapk
Answer by Ingo Mierswa:
Hello Swapk,
in principle, tasks like those is what RapidMiner was made for. You could use a windowing to window the past data and create prediction models on the windows for different prediction horizons. Then, all models are applied to the last available window and the predictions are appended to the series as prediction.
Cheers,
Ingo
Question by Gladys:
Hello Ingo:
How we can implement this windowing scheme in RapidMiner?
Best Regards,
Gladys
Answer by Ingo:
Hi Gladys,
the basic idea is to use a windowing operator like in the following process:
There is also an operator for multivariate windowing and there are sliding window validations (backtesting) which are more appropriate for this type of analysis. Just try to play around in this field!
Cheers,
Ingo
Hi,
I have some electricity data for past 1 year.Using this data i want to predict data for next 1 month.Is it possible with RM.
can somebody help me in this regard.
Thanks in advance
Thanks,
Swapk
Answer by Ingo Mierswa:
Hello Swapk,
in principle, tasks like those is what RapidMiner was made for. You could use a windowing to window the past data and create prediction models on the windows for different prediction horizons. Then, all models are applied to the last available window and the predictions are appended to the series as prediction.
Cheers,
Ingo
Question by Gladys:
Hello Ingo:
How we can implement this windowing scheme in RapidMiner?
Best Regards,
Gladys
Answer by Ingo:
Hi Gladys,
the basic idea is to use a windowing operator like in the following process:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="number_of_attributes" value="1"/>
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="Series2WindowExamples" class="Series2WindowExamples">
<parameter key="series_representation" value="encode_series_by_examples"/>
<parameter key="window_size" value="10"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<operator name="LibSVMLearner" class="LibSVMLearner">
<list key="class_weights">
</list>
<parameter key="svm_type" value="epsilon-SVR"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
There is also an operator for multivariate windowing and there are sliding window validations (backtesting) which are more appropriate for this type of analysis. Just try to play around in this field!
Cheers,
Ingo
0
Answers
Marcel.
the basic idea is pretty simple. Let's say you have a series of values (univariate case, i.e. only one dimension):
v1
v2
v3
v4
v5
v6
...
v100
The task now is to learn from the past to predict a value sometime in the future. For solving this task, we will employ a windowing approach. The first question is: how long should be the history we look at? This is the width of the windows. Let's say we regard a history of 5 values for each prediction. The second question is, how far we want to look into the future? For sake of simplicity, let's say we just want to predict the next value (i.e. we use a prediction horizon of 1). A windowing will then transform the data set like this (using a step size of 1): The first five attribute are the history which is taken into account as attributes / variables / features to learn from. The label is the value which should be predicted. It's is simply the next value after the last value of the window (since we chose a horizon of 1).
On this new data set, you can simply use any regression learning technique you want. Together with strong learners like SVM, this method often clearly outperforms classical methods like ARMA / ARIMA and delivers better and more robust results than neural networks for time series predictions. And with RapidMiner, you can easily apply all preprocessing techniques, extract features, create preprocessing models and perform fair evaluations (backtesting!).
More information about this windowing approach can be found in the master thesis of Stefan Rüping (in German only, sorry):
http://www-ai.cs.uni-dortmund.de/auto?self=$Publication_1048264721699
and I think also in this paper:
http://www-ai.cs.uni-dortmund.de/auto?self=$Publication_1059736767197
Of course you could get a lot more information about univariate and multivariate (time) series predictions in our training courses ;-)
Cheers,
Ingo
Would you suggest me any special technique to predict stock prices time series?
I am absolutely sure that you approach this issue in a training course at Rapid-i. Unfortunately, I won´t be able to fly to Europe until the end of the year. I am a brazilian mathematician/entreprenuer looking forward to beat the market using time series predicition
Hope to meet your team in the future,
Viele Grüsse aus dem sonnigen Rio,
Braulio
Do you have training facilities in France? It is somewhat difficult to come over to Dortmund! On the other hand your answers on this forum are so good, that I pose you another question: How do I put the records in the input file? Recent date first or the oldest date first? Does it care in which order I put the records in the timeseries inputfile?
Thanks a lot
The basic idea is always the same: use windowing to transform series data into features describing the history for the current time point (by using the windowing approach described above). Then add indicators extracted from the complete series (for example with the operators from the value series plugin) so far or only from the regarded history or any other time window (e.g. with the ExampleSetJoin operator). Then add additional features describing economics etc (again with the ExampleSetJoin). Learn the prediction model from the complete aggregated feature set and predict either the actual value or triggers to buying or selling. And you will see: sometimes you beat the markets, sometimes you won't. So it might be important to optimize the model and the preprocessing taking the costs into account. For the latter, the support by RapidMiner could be better but it would be quite easy to develop such a cost evaluation operator yourself (or let it be developed).
Your German is great by the way! Of course this would be slightly less impressive if you came from here (or lived here for some time ;-). I would wish that I would be able to send you sunny greetings as well, but actually it's raining cats and dogs right now...
Maybe we meet later this year. I would be looking forwar to this.
Cheers,
Ingo
http://rapid-i.com/content/view/110/125/
http://rapid-i.com/content/view/111/126/
It's is the oldest date first if you follow the format described above, i.e.
v1
v2
...
v100
...
Hope that helps,
Ingo
I am also interested in predicting time-series values using the windowing process. I am however, a bit stuck at the preprocessing stage. If I have too many variables, what would the best preprocessing algorithm or process be to use.
By the way, thanks for the great replies.
Deon
if you have more than one variable, the operator "MultivariateSeries2WindowExamples" has to be used instead of the "Series2WindowExamples" which only works for windowing a univariate (i.e. single attribute) time series. The basic approach is the same but you have to define which of the columns should actually be predicted.
So lets say you have three columns with series data with 100 time points like here:
u1 v1 w1
u2 v2 w2
u3 v3 w3
u4 v4 w4
u5 v5 w5
u6 v6 w6
...
u100 v100 w100
The task now again is to learn from the past to predict a value sometime in the future - but now the history all columns should be taken into account, not only a single one. For solving this task, we will employ a windowing approach like it was described above for the univariate case. Let's say you want to predict the value for the middle (v) column then the result of the windowing (window width 5, step size 1, horizon 1) will look like: The first five attribute are the history of "u", the next the history of "v" and the next five are the history of "w" which are taken into account as attributes / variables / features to learn from. The label is the value which should be predicted. It's is simply the next value after the last value of the window of the dimension to predict (since we chose a horizon of 1).
In addition you could of course also add merge other describing attributes to those by using the corresponding operators.
Cheers,
Ingo
First of all thanks a lot for your answers which are truly helping us further.... in very small steps... We are trying to implement the MultivariateSeries2WindowExamples operator, following your example of three columns (window width 5, step size 1, horizon 1). Our settings are the following:
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="create_single_attributes" value="false"/>
<parameter key="label_dimension" value="1"/>
<parameter key="series_representation" value="encode_series_by_examples"/>
<parameter key="window_size" value="5"/>
Problem: The result file doesn't give us back a window for all 3 variables like in your example. It only provides a window for the last variable/column in the datafile? It seems that we didn't initialise properly the "Multivariate" function, or we are still in Univariate mode. A second question: Is it possible to enrich the initial datafile with the newly created window information, Is there something like a join operator. I guess positif....
Danke sehr!
here is a small example:
Please note that the DataGeneration just produces random data (so don't expect to learn good models from this data set)- The data set has three columns (I renamed them to u, v, and w) containing the series values. We have 100 points of times (encoded by the examples, hence the setting in the windowing operator).
After the breakpoint was reached, you will see the data set. After resuming the process, the result is a windowed data set containing 15 new attributes (named "Series0" until "Series14" - we should rename those...) and a label taken from the "v" column (the column with index 1.
From this windowed data set we can now learn an arbitrary regression model like it was described below.
Hope that helps,
Ingo
Thanks for the reply, it was quite helpful. I have two more questions I would like to trow your way and would greatly appreciate your response.
I set up the process and everything works fine, except that I run out of memory (stack overflow) using the MultiLayerPerceptron as a learner because I start off with too many variables (89). So here is my first question:
Before employing the MultivariateSeriesToWindowExamples algorithm, which algorithm should I use to decrease the number of initial variables according to intercorrelations and covariances?
Now for my second question:
In the process you posted above, you added an algorithm "FeatureNameFilter", could you please explain why you need to filter the name "label", is ti because you do not want the computer to "see" the variable it is supposed to predict? Forgive me if I've got the cat by the tail, I'm a relative newbie to this field of work.
Thank a lot again for the speedy replies.
Deon
you can use the different features / attribute weighting operators available or one of the operators CovarianceMatrix (which is not able to produce feature weights) or CorrelationMatrix (which is also available to produce feature weights with a certain setting) and use the operator AttributeSubsetPreprocessing right after that. This might look like: Another idea might be to reduce the window width since this further increases the number of resulting attributes. The total number of attributes after windowing is number_old_attributes * window_width.
About the FeatureNameFilter: I simply used it to create a data set in exactly the same "look and feel" than the one we discussed here. The windowing operator creates a label on its own by using the values after the horizon from the specified label column so I don't need an extra label here. So no magic here
Cheers,
Ingo
We've got the multivariate operator running as shown in your example. It is not easy but every day we are getting closer to our goal, which is to predict trafic usage data for the next day(s). In order to do so we are experimenting with the MLP operator because of the numeric type attributes. But, when we are using the network on the windowed usage data (5 days) and try to predict the next day figure, the prediction is quite the opposite as what we expect it to be. In fact, it seems that the network isn't predicting the value for the next day (date+1) but simple repeats yesterday's value (date-1). Perhaps, because the correlation is the biggest at this point? Do you have any idea what are we doing wrong or dit we create a NN that predicts the past.... ? Here our settings for the MultivariateSeries2WindowExamples operator:
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="label_dimension" value="0"/>
<parameter key="series_representation" value="encode_series_by_examples"/>
<parameter key="window_size" value="5"/>
</operator>
this is where the "fun" begins...
If the model is not able to get the underlying process right, what would probably be the best prediction? Right, the last known one. So I would personally try to
- change the learner (by the way: neural nets are not really known to work well on high-dimensional data), for example, try SVM or other linear and non-linear regression schemes
- change the learning parameters, let's for example say the kernel parameters of the failure costs of a Support Vector Machine
- change the windowing parameters, e.g. try different amounts of history (window widths)
You could of course optimize the structure and / or the parameters automatically by using the appropriate parameter optimization operators.
Cheers,
Ingo
As you indicated, we change the learner and indeed the results are much better now.... for the training part of the story. Now, if we try to predict the next day value with a test dataset (= trainingset + [date+1]), the model doesn't perform as expected. Please can you check our setup below and give us a feedback:
att1 att2 att3 att4 att5 label prediction
u1 u2 u3 u4 u5 u6 performance OK : Trainingdata day-1
u2 u3 u4 u5 u6 u7 performance OK : Trainingdata day
u3 u4 u5 u6 u7 -- performance KO : Testdata date+1, without a label value for date+1
I have the impression that the learner expects us also that we provide a label value for date +1, a value we don't know yet and of which we want the model to predict it. Off course, we can put a MA, but that's is not an appropriate solution.
Thanks once again!
the basic data setup looks ok. It is of course not necessary to specify the label for applying the model. I would, however, not put both data sets into the same example set but use one data set for training and one for the application. Alternatively, you could also use the SeriesPrediction operator. Well, I am not sure if I get your point....
Using a model applier is the one and only correct solution...?
Maybe you can elaborate a bit on why you think it is not appropriate and we can see where the problems in understanding could be.
Cheers,
Ingo
Let the "fun" begin..., I don't know exactly what you are trying to say by that.. We have the impression that it's becoming harder each time we get one step closer to our goal.
Ok, here goes our daily question: We have thrown in a SVM learner as you suggested. We are quite happy with the learner perfomance on the training data, but on the testdata the results are worse than bad. So, our first question is do we need to apply XValidation on SVM learners? Machine Learning theory states that the overfitting problem doesn't apply here (like with the NN), because SVM always achieves a global, unique and optimal solution. What is the rule for applying XVal? Always on any any learner?
A second question: When we present unseen testdata to the model, SVM (justlike the NN that we tried earlier) seems to get lost. What could be the reason for this? Too few examples (we have about 1000 examples)? Or, is the universal function simply too complicate to approximate? But, if the problem is too complicate to solve, why is the SVM performing well on the trainingdata? I guess that it uses an optimal function per example (so 1000 functions in our case) instead of a single, unique and optimal solution. Please, can you clarify a bit what is going on here.
Thanks again!
Theory of course does not state that overfitting does not occur for SVM. It is only less likely and for a given cost parameter "C", a SVM guarantee to find the model with largest margin corresponding to the best generalizing model in this model class. BUT: set "C" high enough and the training error will get more weight - leading to perfect overfitting. For a given "C", yes. That does not mean that the model class itself cannot be sort of overfitted or that the value for C is simply too hight so that training examples get too much weight. Beside that: the optimization inside of a SVM again is heuristic (theory is not!) and so there will not be something like a guarantee in practice.
The usage of validation schemes is a nest of misunderstandings we quite often hear from users and customers. Error estimation (validation) is sort of completely independent of model building and it actually should be. You use a cross validation for estimating the error of a model with certain settings on a certain input representation. So this will allow you get a feeling how good a model will perform in real-life. And you should always perform cross validation to get a feeling how good it performs - independently of the concrete learning method. And you should even use outer cross validations for other optimizations like feature weighting / selection or parameter optimization. Often people use an extra validation set but since it is easily possible to nest several cross validations in RapidMiner we always recommend doing this (you will get a more robust estimation including additional information like standard deviations etc.) Not without seeing the processes you performed. And actually, it might even be necessary to see your data.
Please don't get me wrong and I definitely do not want to be offending but as I said before I get the slight feeling that you lack a deep analysis or data mining background. This is of course not a problem but it might be a good idea to start with easier prediction tasks not including time series predictions to get more familiar with data mining concepts in general before you advance to more complicated tasks like time series predictions. There is lot of literature out there on these topics and RapidMiner also provides a large amount of commented samples which should help you to get more familar with the basic concepts. This learning process might take some time (depending on your background knowledge and the amount of time you are able to spend into this on a regular basis). If time is short I of course can only recommend our training courses on data mining in general and time series predictions in particular. However, there also several people out there which learnt everything only with the help of literature and this forum so I assume this is also possible if you are patient. But in order to help you I definitely need a bit more concrete information than "does not work. why?", sorry.
Cheers,
Ingo
We've got your message. We will dive into the nitty-gritty of datamining. We will come back onto this forum once we have mastered this black beauty. Just a final question: Are the any robust regression learners out there that are consistent in terms of error/performance on both training and testdata. The learners we have tried are very clever on trainingdata, but the opposite on trainingdata. But this is probably because of our lack of expertise or our process setup:
TRAINING:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="D:\Ecco\Package\Data\Process\gen.xls"/>
<parameter key="id_column" value="2"/>
</operator>
<operator name="FeatureValueTypeFilter" class="FeatureValueTypeFilter">
<parameter key="except_features_of_type" value="real"/>
<parameter key="filter_special_features" value="true"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="label_dimension" value="0"/>
<parameter key="series_representation" value="encode_series_by_examples"/>
<parameter key="window_size" value="10"/>
</operator>
<operator name="ExcelExampleSetWriter (2)" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="D:\Ecco\Package\Data\gen.xls"/>
</operator>
<operator name="OperatorChain (3)" class="OperatorChain" expanded="yes">
<operator name="RVMLearner" class="RVMLearner">
<parameter key="max_iteration" value="2"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="D:\Ecco\Package\Data\Models\out\gen.mod"/>
</operator>
</operator>
</operator>
Here we take gen.xls timeseries file created by the ExcelExampleSetWriter and we add an example [date+1] without label (see description previous post)
TEST:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource (2)" class="ExcelExampleSource">
<parameter key="excel_file" value="D:\Ecco\Package\Data\gen.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="11"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="D:\Ecco\Package\Data\Models\out\gen.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="D:\Ecco\Package\Data\Models\xls\gen.xls"/>
</operator>
</operator>
The process seems to execute fine; we have timeseries, good trainingresults, but poor testpredictions.
Thanks once again!
This is actually not (only) a question of robustness of methods but most often a question of finding the correct model class and learning parameters and / or of finding a more suitable input representation. The whole problem of machine learning is to find out which parameters performs well on unseen test data. A 1-NN learner usually is not very robust in your sense but starts to get more robust when k is increased. A decision tree often is quite good on test data (almost as good as on training data) when you only have found out where to prune. Assuming you have found a well-fitting model class, a fine-tuned SVM (main parameters: C and the corresponding kernel parameters) is a very robust learning scheme and will generalize well on unseen data. But all these methods will often fail on test data is the model class / parameters / input representation is not chosen appropriately.
If such a generally robust learning scheme which always is as good on test data then on training data would exist, we could actually stop working on these methods since it is often much easier to optimize a already found model to deliver better performance than to find a model which performs well at all.
Here are some comments on your processes (more would only be possible if I would work directly on the data...):
1.) Does your data contain a lot of trend? The default windowing used by you does not remove trend and so you predict absolute numbers. Without trend, this is usually ok but if there is a (strong) trend the predictions on completely unseen data is hardly possible since the absolute value which should be predicted did not occur before and can hence not be predicted (as absolute value). Possible solutions for this is to remove the trend before (e.g. by calculating a linear regression model, create the predictions, subtract the predictions from the values with a feature construction operator) or to predict relative changes (not possible with the standard windowing / modelling discussed so far in this thread).
2) RVM learning is veeeeery slow (as I am sure you noticed....) For that reason you probably reduced the number of iterations. But then nothing is learned at all but in the first iterations the data is merely memorized: voila --> overfitting. I would replace the RVM with, for example, the JMySVMLearner or the LibSVMLearner with a linear (dot) kernel function first and play around with C (e.g. try values like 1, 10, 100, 1000...)
3.) I am assuming that you did not make any error by adding the new values. However, this is a manual process and therefore more likely to make errors. So why did you not first try to estimate the performance with a cross validation / sliding window validation in order to optimize the model settings and parameters and input representation before you start to apply the model on external test data?
Hope that helps. Cheers,
Ingo
I have a few questions that I wish to ask you.
1. The W-MultiLayerPerceptron algorithm, I have used it before with good results, but currently I cannot use it, for I have too many variables. It runs out of memory (stack overflow).
Is there possibly a way of changing the settings (I am using the default settings) in order to reduce the amount of memory used? Or, is there maybe another way of increasing the chances of the algorithm working by setting the settings on my computer (of my computer, not RM)?
2. If I have purely numerical atributes as well as a numerical label, what would be the best settings for the LibSVMLearner?
3. Is there a possible way I can use the MultivariateSeries2WindowExamples algorithm with a nominal label?
4. If I have two different algorithms, one for the learning (and generalization) and one for the testing, can I use the same preprocessing on both processes before the MultivariateSeries2WindowExamples algorithm? I am afraid that for instance a correlation matrix with create weights ticked might yield different attributes between the learning process and the testing process to be taken out, therefore yielding a different amount of variables after the MultivariateSeries2WindowExamples algorithm.
I hope I have not been too vague in my questions.
Thanks a lot,
Deon
http://rapid-i.com/content/view/17/40/
The memory management is described at the end of the document. Please note that the amount of used memory must be manually increased for non-Windows systems anyway.
Another option of course is to use the 64 bit version if you have a 64 bit Windows operator system and more than 4 Gb main memory.
I am not too familiar with the MLP algorithm but you can probably reduce the number of inner nodes leading to less memory consumption but I am not sure how this is done. For the NeuralNet learner, this can be set by the hidden layers parameters.
You definitely should set the SVM type to either epsilon-SVR or nu-SVR. The RBF kernel usually is a good choice but you should optimize the kernel parameter gamma. Another important parameter is C which often must be decreased. You can perform this parameter optimization automatically with RapidMiner by employing either the GridParameterOptimization or the EvolutionaryParameterOptimization.
Only be applying the preprocessing operator "Nominal2Numerical" first. This operator just maps each nominal value to the internally used real-value index.
You should write the AttributeWeights created during training into a file and reload and apply the training weights on the test set as well instead of recreating the weights on the test set. Attribute weights and attribute selections are a special case which must be handled via the corresponding IO operators and operators like AttributeWeightSelection (look at the sample processes for some examples). For all other preprocessing steps, you could either apply them on the test data again (e.g. a user based discretization) or create a preprocessing model during training (by ticking the corresponding parameter) and apply this preprocessing model on the test data.
Cheers,
Ingo
Thanks a lot for all the good advice. I really appreciate it. I feel a little bad that I only take and never give, yet I am going to ask still more. I hope you don't mind...
Back to data preparation, for a time-series set of variables, would it be best to use the time-series (i.e. the Close price of the EURUSD), or the difference between the Close prices of the previous day and the current day?
If I do take the differences, would you suggest discretizing the data or not? I am sure discretizing the data if I only use the Close prices would not make a lot of sense, am I wrong in saying that?
I'm a slow learner, but I'm getting there one step at a time.
Thanks
Deon
I just read my own post again and found a mistake: Actually, the value should be increased instead of decreased.
That actually depends on how much trend occurs in total. If basically each range of "possible" values has been available in the past or if there is no global trend, you can just predict the absolute values. Otherwise, it is better to predict the relative changes.
There is a new meta learner "RelativeRegression" but I am not sure it was already part of the last release or if it is part of the CVS already. I have right now no access to RM in order to check it, sorry. I would say this depends on the setting. If you say that sharp changes are the most interesting point but the actual value has not too much meaning I would discretize into something like "stable" (small changes), "sharp change" etc. You could also include classes like "Buy Signal" or "Sell Signal" and so on. In some settings you are more interested in the concrete value and then I would of course not discretize.
Cheers,
Ingo