How to get forecast values of future from time series data
Hello
I am a week old rapidminer user facing difficulty in developing time series models. I have been using R for about 6 months now and was successful in integrating R scripts with rapidminer without any hassle.
In R we have a forecast function which allows us to set the future periods to be forecasted and model variable which then gives use the forecasted values of given future period. This is exactly what I intergrated in rapidminer and got the forecast values using ARIMA model.
My question is How can I get the forecast values using ARIMA in rapidminer itself without integrating R script. Most of the examples I have seen on the web does the model evaluation on training data only. In simple terms I have historic weekly data as
week units
week1 20
week2 35
week3 27
week4 12
......
week500 43
I need forecast values for
week501 ?
week502 ?
week503 ?
...
week552?
links I referred:
https://www.youtube.com/watch?v=w0vSSEq2bn0
Thank you
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Well then you're going to love this p,q,d optimizing process. Make sure you have the Fin/Econ extension installed too, it pulls some sample data. This process optimizings around the AIC.
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="quantx1:yahoo_historical_data_extractor" compatibility="1.0.006" expanded="true" height="82" name="Yahoo Historical Stock Data" width="90" x="45" y="120">
<parameter key="I agree to abide by Yahoo's Terms & Conditions on financial data usage" value="true"/>
<parameter key="Quick Stock Ticker Data" value="true"/>
<parameter key="Stock Ticker" value="S&P"/>
<parameter key="select_fields" value="VOLUME|OPEN|DAY_LOW|DAY_HIGH|CLOSE|ADJUSTED_CLOSE"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="date_start" value="2013-01-01"/>
<parameter key="date_end" value="2015-06-03"/>
</operator>
<operator activated="true" class="rename" compatibility="7.4.000" expanded="true" height="82" name="Rename" width="90" x="179" y="120">
<parameter key="old_name" value="S&P_ADJUSTED_CLOSE"/>
<parameter key="new_name" value="AClose"/>
<list key="rename_additional_attributes">
<parameter key="S&P_CLOSE" value="Close"/>
<parameter key="S&P_DAY_HIGH" value="High"/>
<parameter key="S&P_DAY_LOW" value="Low"/>
<parameter key="S&P_OPEN" value="Open"/>
<parameter key="S&P_VOLUME" value="Volume"/>
</list>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply" width="90" x="313" y="120"/>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Forecasting" width="90" x="715" y="435">
<parameter key="script" value="### Call this R scripts to get AIC from ARIMA models rm_main = function(data) { 	library(forecast) 	sp <- data 	sp$Date <- as.Date(sp$Date) 	arima <- arima(ts(sp$Close), order=c(3,1,3)) 	print(arima) 	arimaforecast <- forecast.Arima(arima, h=5) 	print(arimaforecast) 	return(as.data.frame(arimaforecast)) } "/>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="7.4.000" expanded="true" height="103" name="Optimize Parameters (Grid)" width="90" x="514" y="300">
<list key="parameters">
<parameter key="Set p.value" value="[0;3;3;linear]"/>
<parameter key="Set d.value" value="[0.0;2;2;linear]"/>
<parameter key="Set q.value" value="[0.0;4;4;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="7.4.000" expanded="true" height="76" name="Set p" width="90" x="112" y="30">
<parameter key="macro" value="p"/>
<parameter key="value" value="3.0"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.4.000" expanded="true" height="76" name="Set d" width="90" x="112" y="120">
<parameter key="macro" value="d"/>
<parameter key="value" value="2.0"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.4.000" expanded="true" height="76" name="Set q" width="90" x="112" y="210">
<parameter key="macro" value="q"/>
<parameter key="value" value="4.0"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="112" name="ARIMA" width="90" x="447" y="75">
<parameter key="script" value="### Call this R scripts to get AIC from ARIMA models rm_main = function(data) { 	sp <- data 	sp$Date <- as.Date(sp$Date) 	arima <- arima(sp$Close, order=c(%{p},%{d},%{q})) 	#print(arima$aic) 	return(as.data.table(arima$aic)) } "/>
<description align="center" color="transparent" colored="false" width="126">Fit ARIMA model in R with diffeferent(p,d,q)</description>
</operator>
<operator activated="true" class="extract_performance" compatibility="7.4.000" expanded="true" height="76" name="Performance" width="90" x="581" y="75">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="V1"/>
<parameter key="example_index" value="1"/>
<parameter key="optimization_direction" value="minimize"/>
</operator>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="76" name="Log" width="90" x="715" y="75">
<list key="log">
<parameter key="aic" value="operator.Performance.value.performance"/>
<parameter key="p" value="operator.Set p.parameter.value"/>
<parameter key="d" value="operator.Set d.parameter.value"/>
<parameter key="q" value="operator.Set q.parameter.value"/>
</list>
</operator>
<connect from_port="input 1" to_op="Set p" to_port="through 1"/>
<connect from_op="Set p" from_port="through 1" to_op="ARIMA" to_port="input 1"/>
<connect from_op="Set d" from_port="through 1" to_op="ARIMA" to_port="input 2"/>
<connect from_op="Set q" from_port="through 1" to_op="ARIMA" to_port="input 3"/>
<connect from_op="ARIMA" from_port="output 1" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="36"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Yahoo Historical Stock Data" from_port="example set" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 3" to_op="Forecasting" to_port="input 1"/>
<connect from_op="Forecasting" from_port="output 1" to_port="result 3"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="162"/>
<portSpacing port="sink_result 3" spacing="126"/>
<portSpacing port="sink_result 4" spacing="36"/>
<description align="center" color="yellow" colored="false" height="62" resized="true" width="816" x="305" y="18">Look at Economic Time Series Data (automatically pulled) from public sites and integrate with ARIMA in R extension</description>
<description align="center" color="yellow" colored="false" height="133" resized="true" width="635" x="490" y="83">Charts for data. Identify any unusual observations for all attributes: day low, high, open, close, adjusted close, volumn</description>
<description align="center" color="yellow" colored="false" height="177" resized="true" width="626" x="500" y="228">Find the optimized parameter for ARIMA (iterative, and TAKE TIME!! about 1 min)<br>Use R extension for ARIMA models<br>for this demo data, we have ARIMA(3,1,3) as the best fit<br/>To chose the best fit model: check Log result, rank by AIC<br/>and find the values of p, d, q corresponding to min AIC</description>
<description align="center" color="yellow" colored="false" height="116" resized="true" width="415" x="713" y="414">Apply ARIMA(3,1,3) for forcasting<br>predict the next 5 days close price<br></description>
</process>
</operator>
</process>0
Answers
Hi,
There is no native ARIMA operator in the Series Extension, yet. However you can try to tweak the attached process. This process comes from Bala's and Vijays book on forecasting point values in RapidMiner based on the previous time series patterns.
W.R.T. to ARIMA on training data, you can embed your ARIMA R script inside a Sliding Window Validation operator and test the Perfomance.
Many thanks for your reply.
I managed to fit the ARIMA with some constant p and q values. However, I am having hard time trying different combination of p and q as I have to do it manually.
Syed
Wow. Great.
Many thanks for your help.
Syed