Expert opinion requested on Times Series based Prediction
So I'm studying machine learning using RapidMiner and I'm now focusing on Time Series Prediction.
My son earns some pocket money by trading stocks, forex and futures. He does that with technical analyses of prices.
He looks for an asset that shows a clear trend in conformance of Selecting Forecasting Methods in Data Science.
Then my son zooms in on the M-curves of the latest period. Using support and trendlines he "predicts" the future price of the asset.
My thought was to give him a Machine Learning perspective on his analyses.
So I looked at Oil Futures and build a process model on it, based on the daily "Last" values. The model looks like this:
In the upper left I have implemented 3 RapidMiner Macros:
- %{AnalysesDateFrom}: From where to pick up the "wave to surf" trend like my son is doing.
- %{PredictionDateFrom}: This is my "hold off" parameter. I train the model to this date. I let the model predict from this date.
- %{PredictionHorizon}: It sets the Horizon parameters in the Windowing operator, in the Sliding Window Validation operator and in the Forecasting Performance operator implemented in the subprocess of the Sliding Window Validation operator so all operators work with the same Horizon.
When I run the model with %{AnalysesDateFrom} = "Feb 10, 2016", %{PredictionHorizon}=10 and %{PredictionDateFrom}="Aug 28, 2017" (last month) the model returns a prediction_trend_accuracy: 0.625 +/- 0.099 (mikro: 0.625). For what this accuracy figure is worth, I know that value prediction is "slippery ice", I'm therefore more interested in trends.
My question is related to the next graph in which I have plotted the prediction together with the real "Last" values.
This plot clearly shows that the trend of the prediction is in conformance of the trend of the real "Last" values.
What I don't understand is that the prediction and the real "Last" values are "in phase" which each other. I would expect a phase shift between both lines, a phase shift equivalent to the Prediction Horizon. That phase shift is not visible. What am I doing wrong here?
The only explanation I can think of for the absence of a phase shift is that the value of an asset in a moment in time is the best indication of the future value of this asset. In other words: the current value of an asset incorporates already future values of this asset. That would explain that the lines of real values and the prediction values are in sync with each other. But I am not sure so I would like to receive an expert opinion on this.
Best Answer
-
luc_bartkowski Member Posts: 46 Maven
I have found the answer on my question.
My source data is sorted on dates because I use a SQL script to prevent to load too much data compared to my RM license.
I use the following SQL: "SELECT * FROM oil ORDER BY Date DESC LIMIT 9999".
The example set as input for the Windowing operators are sorted decending on Date.
When I sort the example set on Date ascending then the model works as expected.
See next pictures
Added Sort operator
New resulting example set
Prediction is almost equivalent with oilLast-0
No phase shift. Of course not. Question answered.
Watch out for sorting dates.Apperently RM is not using the value of a Date attribute during "Set Role to ID" but it establish an ID on basis of the input sort order.
Greetings,
Luc
3
Answers
Hi Luc,
i think the answer is simple. Your prediction(label) is the oil price tomorrow (or in x days). While your OilLast-0 is the OilPrice today (-0 indicates 0 days lookback).
You most likely want to also generate a Label in the lower windowing and compare this to the prediction.
Cheers,
Martin
Dortmund, Germany
Thank you @mschmitz for your fast reply,
"Your prediction(label) is the oil price tomorrow (or in x days)".
"While your OilLast-0 is the OilPrice today (-0 indicates 0 days lookback)."
I understand both. But I don't see it in the graph and the exampleset:
I checked also the examplesets of the upper and lower Windowing operators using a "breakpoint after".
My source data is stored in MySQL. I compared both to make sure that my process is working as expected.
The value of the Label on August 25 is based upon the "Last" value of August 11 in the source data.
August 11 is 10 days before August 25 so that is correct.
The values of the "-0" attributes of August 25 are equivalent to the attributes of the source data on August 25.
That is also correct.
The results of the lower Windowing operator are also correct.
The values of all "-0" attributes on September 28 are equivalent to the source data on September 28.
So I don't understand the graph. It looks like the prediction is following the real values of "Last" instead of the other way around.
This is my process model:
Thanks for your support.
Cheers,
Luc
I'll be posting my Historical Volatility process when I have a chance to write it up. In that process you take a t=0 time series and predict at t+1 value. From there you can see how it works.
I think I have found the answer on my question.
But I don't know how to implement it.
Looking again to the problem I conclude the following:
On August 11 the Label should look at the "Last" value of August 25 to learn/validate. See the blue markup.
Instead, as I indicated before, the upper Windowing operator is looking backwards, it puts the last value of August 11 as Label on August 25.
I tried to configure the upper Windowing operator looking forwards in stead of backwards by configuring a negative -10 or (%{PredictionHorizon})*-1) in the Horizon parameter. The Horizon parameter of the Windowing operator doesn't accept negative integers, only positive integers. So I don't know how to implement a forward looking Label instead of a backward looking Label.
I'm using v. 7.6001
Greetings,
Luc
Hi!
I tested your process and have to say that I really like it but have one question on it:
How do you show any forecasts then (like for the following week)?
Thanks!