Label versus prediction(label) in series forecasting
Hi,
I have been working with series forecasting similar to what is available on http://www.simafore.com/blog/bid/109175/Time-Series-Forecasting-using-RapidMiner-for-cost-modeling-2-of-2. However, I have noticed that when comparing trend the author plotted the prediction(label) pattern versus Commodity A-0 pattern to show the trend. I do not understand why didn't we plot the label vs. prediction(label) to examine the trend.
I have done some forecasts in which when I compare the trends of prediction(label) to label my the trend results in 50% accuracy. However, when I compare the prediction(label) against the "label"-0 value I get high trend accuracy of 75%.
What am I missing here?
Answers
So it's been a while since I looked at that post but based on his snapshot, this is what I why I think he did that.
In the upper flow he used the Windowing operator to create a label column from the existing time series data. What happened was that he selected the output attribute column and called it label. The Window then offset the remaining attribiute columns and renamed as Commodity-0, Commodity-1, etc. Depending on the window size, you'll have attributes like xyz-0, xyz-1, xyz-2, etc. He then trained the model to predict the label column by shifting the rest of the attributes in time.
The second flow is where he used the scoring data. He had to use the same window size but made sure the label wasn't there. after all that's what you want to predict. So that set generate a prediction(label) and he compared with Commodity-0 because (and here's my guess), there was only one time series.
Have you checked out all the time series stuff on the Community? I wrote a very detailed response on using the Windowing operator here: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Time-Series-using-Windowing-operator-in-RapidMiner/m-p/31791
Hi Thomas,
Thank you for your immediate response. I have a good understanding of the window operator and I also watched your videos on series forecasting.
Still I am a bit preplexed as I have developed a series forecasting code (very simple): a series goes through a windowing operator, inputted into a neural network model that has been trained using a windowed series with a horizon of 1, and the output. The output would show for each row: t-4, t-3, t-2, t-1, t-0, prediction(label). Now what is amusing is that when comparing the trend of the prediction(label) it follows that of t-0. Imagine the below:
t-10,t-9,t-8,t-7,t-6, prediction(t-5)
t-9,t-8,t-7,t-6,t-5, prediction(t-4)
t-8,t-7,t-6,t-5,t-4, prediction(t-3)
....
....
Why would I observe that prediction(t-5), prediction(t-4), prediction(t-3) would follow t-6,t-5,t-4?
Should not I observe that prediction(t-5), prediction(t-4), prediction(t-3) follow t-5,t-4,t-3 to test for trend forecast accuracy?
Much appreciated, Thomas!
Well you have to be careful here. If you use a Window operator across many attributes to predict a label, then each attribute column should have influence the outcome of the label. So it's not really ok to assume that prediction(label) follows att1-0 if you have att1-1, att1-2, att1-3, etc. If you can post some sample data and process, we can inspect it.
Hi Thomas,
Below is the XML code and I attached a file for your reference. When I compare the trend against p-0 I get 88% trend accuracy yet against the lable I get 74% and this happens with all wavelet details. Is it wavelet related or more the effect of the neural network?
Best,
Malek