Forecasting based on Two dependent variables
Hi,
I want to create a forecast based on two variables (i.e) weather forecast and historical actual reads. How is that implemented in Rapidminer?
Also, I was able to generate a forecast using vector linear regression for 12 months, but the forecast is way out of range, what techniques can we leverage to improve the forecast?
Further how do we use 2 variables for windowing technique. I don’t see this operator "MultivariateSeries2WindowExamples" in Rapidminer 7.5 as discussed in some examples here. I'm I missing something.
Also, Is R forecast model accurate than Rapidminer windowing technique. If yes how to use R model in Rapisminer (do you have any sample R forecast model I can leverage)
Thx in dvance
S
Best Answers
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
So a couple of things, that video of mine is available on my YouTube channel here: https://www.youtube.com/watch?v=UmGIGEJMmN8&t=2s
and with respect to power consumption, you might want to check out this paper on using RM and SVM's to forecast electricity consumption. It starts on page 46 or thereabouts. I would add in your weather as an attribute and the use your power consumption as a label.
From there build a process like this that loads and ETL's your data and use a Windowing and Sliding Validation operator. Insert a SVM set to RBF kernel and then optimize around the gamma and C parameters.
For a sample process you can try this process in this thread: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Financial-Time-Series-Prediction/m-p/33456
0
Answers
The old Multivariate2examples operator is now called the Windowing operator, so it's still there.
With respect to algorithms, you might want to examine using a SVM with and RBF kernel, optimized around gamma and C parameters. There's been some research around them that says they're pretty good for time series modeling. I have some links on my site here: http://www.neuralmarkettrends.com/using-svm-kernels-for-time-series-analysis/
With respect to multiple labels, with the Windowing operator you have to select one attribute column as your label but your other variable can be loaded into the window model as well. Windowing in Rapidminer is considered a "cross sectional based" approach of building a forecast, which is different than say a ARIMA forecast. Here's another great read by Simafore: http://www.simafore.com/blog/bid/109175/Time-Series-Forecasting-using-RapidMiner-for-cost-modeling-2-of-2
You can use an R package to make predictions and feed them into your RapidMiner window as well.
Thank you for the quick reply sir.
I see a reference to a video regarding forecasting using SVM. http://www.neuralmarkettrends.com/wp-content/uploads/Rapidminer5-vid10.mp4, which apparently isn’t working. Is there a sample process that you can share, which illustrates how a SVM forecast is done using a time series data?
Also with respect to multiple dependent variables. I have one historical readings and the other future (forecasted), I’m not sure how we design a process that has a historical attribute column as a label and forecasted values as other variable into the window model. Basically, I want to predict the forecast for next 12 months of power consumption based on forecasted weather, as you can see power and weather are directly proportionate .
Would you be kind enough to share a sample process.
As Always, Many Thanks for your time
Would appreciate your valuable input on the below.
designing a process that has a historical attribute column as a label and forecasted values as other variable into the window model ? E.g. Historical power readings -as Label and forecasted weather readings as other varibale
Thx
Raj
Thank you sir. I try to follow along and see how it goes.
Sir,
Is there a way we can train the model for 2 years worth of history? such that the weather seasonaility is accounted in the power consumption forecast? Further, is there a easy way to create dummy variables for Testing data set that conatins futuer dates to predict the power consumption.
Also, with LibSVM, which has RBF kernal, is giving quite a challenge. I couldnt run it successfully, i'm having problmes with LibSVM data type and performance operator, both doenst seem work together. Do you happen to have any examples of LibSVM with RBF kernal that i can use?
Thx
I use the RapidMiner SVM instead of the LibSVM. I find it to be easier.
You can train the model on 2 years of data, just toggle on culmulative training on the Sliding Window Validation operator.
There is a Generate Date operator that might let you create dummy testing data.
I guess i'm missing soemthing here. I do not see RBF kernal in SVM, I only see in LibSVM. Is that correct?
Thank you I just figuered out that radil kernal is ntohing but rbf.
Thx
Its my mistake, I should have told you that RBF stands for Radial Basis Function.
No worries sir.
But I really need your advice here. I'm compltely lost with forecasted numbers of Rapidminer. I have also tried optimizing the parameters for SVM , but no luck. My performance is still same with/without optimizing parameters.
I'm attaching you the source files and my process export. I have also attached some of my questions along with the screenshots in the AnalysisQustions.doc.
As Always Many Thanks for your time
Raj
From your process, i don't see any optimization. What parameters beside C and gamma did you try to optimize?
I'm running some optimization and it appears that your Window Size and Training Window Width gives a big boost to performance. I'm doing a pretty big optimzation so it's taking like 3 hours to run but it's leading me to believe that you need to build a bigger cross section of data to train on.
Thanks Sir. Apparently my results seem to be better using KNN compared to SVM. I currently have a below accuracy with KNN.
prediction_trend_accuracy: 0.695 +/- 0.021 (mikro: 0.695). Can you recommend some parameter optimizations for finding an optimal K value?
I'm kind of lost when the most research papers recommend SVM and apparently my dataset clearly wouldn’t work with SVM. I guess I’m lost:-)
Well with k-nn you might want to use Numerical Measures, and optimize around K and distance measures (i.e. cosine.manhattan, etc). You could also try the GLM modeler too.
Thank you sir. I will keep you posted on my optimization.
Sir, I made a quite a improvement on this model. I'm seeking your assistance in operationalizing this model
Based on the research paper that you sighted earlier http://community.rapidminer.com/ejhxb44622/attachments/ejhxb44622/GettingStartForum/629/1/proceedings_rcomm_2010.pdf .
On page 46 under section “ The application is used in the following sequence“ I need clarity on the step (2) which states “Testing data table is refreshed with new dummy variables, values for label is omitted.
How do we generate this testing data set? I believe my forecast dataset would look something like below. When I pass this data the model fails to apply, since the training data set and testing data set are different (we do not have values for label), we only have weather forecast.
Do you have any example process like this that I can infer from? I would appreciate your input on this.
SO when you train the model you will need a label, but for testing you need to omit that label. Here's a sample to try out.
Thank you sir.
In your process I dont see the option of reading testing data set contaiing just forecasted weather and date? I want to forecast based on my forecasted weather data?
Also when I exclude the load values column, it states that input example set doesn’t not match with training example set. As of now I’m thinking to generate a dummy load values and input the model, just to get the required format for the model to forecast. I’m heading in the right direction here?
Just disconnect the 2nd Windowing operator from the main process branch and load in your test data there. Just make sure all the preprocessing is the same. Then check the order of execution and run it.
That lower branch generates the 'dummy' dates that you use to forecast out into the future BUT your test data will need to contain your temperature and other input variables.
Dear Sir,
Please excuse me. I'm not able to follow this logic.
Each day we get weather forecasted results, which are stored in a Database. I'm directly reading the database for testing data and trying to generate the load forecast. However, since I only have weather forecast. The model is failing at apply model ( since the system load is missing in the training data set).
Below is my code. Would you kindly let me know what needs to be done here to make it forecast