How future predictions can be made with a Time Series model in RapidMiner?
I guess this topic is the most asked question regarding RapidMiner Time Series Prediction. Some examples:
- http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/time-series-predicting-the-future-price/m-p/27559#M20198
- http://community.rapidminer.com/t5/Getting-Started-Forum/Time-Series-Forecasting-for-Data/m-p/37315
- The comments in http://www.simafore.com/blog/bid/109175/Time-Series-Forecasting-using-RapidMiner-for-cost-modeling-2-of-2
- https://stackoverflow.com/questions/36906717/rapidminer-timeseries-prediction/36914858#36914858
We all ask the same question.
We want to be able to do predictions for tomorrow, next week(s), next month(s), whatever the horizon and the dimension of time is.
Some have even asked the same question multiple times in their topic/post as if the question is not clear.
Therefore the following picture, it illustrates the question.
How to:
- Calculate the prediction on Oct 5 (black markup);
- Using the "-0 attributes" from the Windowing operator (blue markup);
- In order to predict (orange arrow) the unknown future Last value on Oct 5 (red markup);
- In the same way the "-0 attributes" (brown markup) are used to calculate the predictions (yellow markup) in the train/validation/test example set;
- But without being able to use the unknown future Last value (red markup) as a label (green markup)?
The only answer with a possible solution is from @Thomas_Ott: http://community.rapidminer.com/t5/Getting-Started-Forum/Time-Series-Forecasting-for-Data/m-p/37315 . His answer links to a XML RM-process in http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Recall-Error/m-p/37302#U37302. That XML implements a complex process including manipulation of macros, multiple windowing operators in series, remember/recall and loop operators and even a "Materialize Data" operator to free-up memory in RapidMiner. The process is also based on the Yahoo Historical Data operator that unfortunately doesn't work anymore. I'm therefore not even sure if this process answers the question of this topic. Is there a more simple process/solution available to answer the question of this topic?
Thanks,
Luc
Best Answer
-
luc_bartkowski Member Posts: 46 Maven
Happy to do so Martin.
To be honest: don't know much yet about ARIMA. Will watch some YouTube regarding ARIMA this weekend.
But luckily RapidMiner offers an Optimization Parameters operator. ?
So @tftemme this is the result:
And the model:
And the XML
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Optimization Cycles" width="90" x="782" y="289">
<parameter key="macro" value="OptimizeCycles"/>
<parameter key="value" value="50"/>
</operator>
<operator activated="true" class="generate_macro" compatibility="7.6.001" expanded="true" height="68" name="Current Date" width="90" x="782" y="85">
<list key="function_descriptions">
<parameter key="CurrentDate" value="date_now()"/>
</list>
</operator>
<operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Prediction Horizon" width="90" x="916" y="85">
<parameter key="macro" value="PredictionHorizon"/>
<parameter key="value" value="20"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Training From Date" width="90" x="782" y="187">
<parameter key="macro" value="AnalysesDateFrom"/>
<parameter key="value" value="2016/02/11"/>
</operator>
<operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Training To Date" width="90" x="916" y="187">
<parameter key="macro" value="TrainingDateTo"/>
<parameter key="value" value="%{CurrentDate}"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="112" y="85">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="jdbc_connectors:read_database" compatibility="7.6.001" expanded="true" height="68" name="Read Database (2)" width="90" x="45" y="34">
<parameter key="define_connection" value="predefined"/>
<parameter key="connection" value="MySQL"/>
<parameter key="database_system" value="MySQL"/>
<parameter key="define_query" value="query"/>
<parameter key="query" value="SELECT * FROM `oil` ORDER BY Date desc limit 9999"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="false" class="store" compatibility="7.6.001" expanded="true" height="68" name="Store (11)" width="90" x="45" y="136">
<parameter key="repository_entry" value="//Cloud Repository/Samples/data/oilfuturesvw"/>
</operator>
<operator activated="false" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve (2)" width="90" x="179" y="136">
<parameter key="repository_entry" value="../data/oilfuturesvw"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="514" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="Volume|Settle|Previous Day Open Interest|Open|Low|Last|High|Date"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Date (8)" width="90" x="648" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="date_type" value="date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="keep_old_attribute" value="false"/>
</operator>
<operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename (8)" width="90" x="782" y="34">
<parameter key="old_name" value="Date"/>
<parameter key="new_name" value="oilDate"/>
<list key="rename_additional_attributes">
<parameter key="High" value="oilHigh"/>
<parameter key="Low" value="oilLow"/>
<parameter key="Open" value="oilOpen"/>
<parameter key="Previous Day Open Interest" value="oilPrevDayOpenInt"/>
<parameter key="Settle" value="oilSettle"/>
<parameter key="Volume" value="oilVolume"/>
<parameter key="Last" value="oilLast"/>
</list>
</operator>
<connect from_op="Read Database (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Date (8)" to_port="example set input"/>
<connect from_op="Nominal to Date (8)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
<connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="sort" compatibility="7.6.001" expanded="true" height="82" name="Sort" width="90" x="246" y="85">
<parameter key="attribute_name" value="oilDate"/>
<parameter key="sorting_direction" value="increasing"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="112" y="238">
<parameter key="attribute_name" value="oilLast"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="oilDate" value="id"/>
<parameter key="oilLast" value="regular"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="238">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="oilLast|oilHigh|oilLow|oilOpen|oilSettle|oilPrevDayOpenInt|oilVolume"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Start of Trend" width="90" x="380" y="238">
<parameter key="parameter_expression" value="date_after(oilDate, date_parse_custom(%{AnalysesDateFrom}, "yyyy/MM/dd"))"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Train until Hold-off" width="90" x="514" y="238">
<parameter key="parameter_expression" value="date_before(oilDate, date_now())"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="124" name="Multiply (3)" width="90" x="112" y="544"/>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="ARIMA Predict Last" width="90" x="246" y="442">
<process expanded="true">
<operator activated="true" class="optimize_parameters_evolutionary" compatibility="7.6.001" expanded="true" height="145" name="Optimize Parameters (Evolutionary)" width="90" x="112" y="34">
<list key="parameters">
<parameter key="ARIMA Trainer.qlithiumorder_of_the_moving-average_model" value="[0.0;100.0]"/>
<parameter key="ARIMA Trainer.plithiumorder_of_the_autoregressive_model" value="[0.0;100.0]"/>
</list>
<parameter key="error_handling" value="ignore error"/>
<parameter key="max_generations" value="%{OptimizeCycles}"/>
<parameter key="use_early_stopping" value="true"/>
<parameter key="generations_without_improval" value="2"/>
<parameter key="specify_population_size" value="true"/>
<parameter key="population_size" value="5"/>
<parameter key="keep_best" value="true"/>
<parameter key="mutation_type" value="gaussian_mutation"/>
<parameter key="selection_type" value="tournament"/>
<parameter key="tournament_fraction" value="0.25"/>
<parameter key="crossover_prob" value="0.9"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="show_convergence_plot" value="false"/>
<process expanded="true">
<operator activated="true" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer" width="90" x="246" y="34">
<parameter key="time_series_attribute" value="oilLast"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="39"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="94"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="true" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast" width="90" x="380" y="34">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="input 1" to_op="ARIMA Trainer" to_port="example set"/>
<connect from_op="ARIMA Trainer" from_port="forecast model" to_op="Apply Forecast" to_port="forecast model"/>
<connect from_op="ARIMA Trainer" from_port="performance" to_port="performance"/>
<connect from_op="Apply Forecast" from_port="example set" to_port="result 1"/>
<connect from_op="Apply Forecast" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
<operator activated="false" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer (6)" width="90" x="313" y="187">
<parameter key="time_series_attribute" value="oilLast"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="2"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="92"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="false" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast (6)" width="90" x="447" y="187">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="in 1" to_op="Optimize Parameters (Evolutionary)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="performance" to_port="out 1"/>
<connect from_op="Optimize Parameters (Evolutionary)" from_port="result 1" to_port="out 2"/>
<connect from_op="ARIMA Trainer (6)" from_port="forecast model" to_op="Apply Forecast (6)" to_port="forecast model"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (3)" width="90" x="447" y="442">
<parameter key="attribute_name" value="forecast of oilLast"/>
<parameter key="target_role" value="regular"/>
<list key="set_additional_roles">
<parameter key="oilLast and forecast" value="regular"/>
</list>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="ARIMA Predict High" width="90" x="246" y="595">
<process expanded="true">
<operator activated="true" class="optimize_parameters_evolutionary" compatibility="7.6.001" expanded="true" height="145" name="Optimize Parameters (2)" width="90" x="112" y="34">
<list key="parameters">
<parameter key="ARIMA Trainer.qlithiumorder_of_the_moving-average_model" value="[0.0;100.0]"/>
<parameter key="ARIMA Trainer.plithiumorder_of_the_autoregressive_model" value="[0.0;100.0]"/>
</list>
<parameter key="error_handling" value="ignore error"/>
<parameter key="max_generations" value="%{OptimizeCycles}"/>
<parameter key="use_early_stopping" value="true"/>
<parameter key="generations_without_improval" value="2"/>
<parameter key="specify_population_size" value="true"/>
<parameter key="population_size" value="5"/>
<parameter key="keep_best" value="true"/>
<parameter key="mutation_type" value="gaussian_mutation"/>
<parameter key="selection_type" value="tournament"/>
<parameter key="tournament_fraction" value="0.25"/>
<parameter key="crossover_prob" value="0.9"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="show_convergence_plot" value="false"/>
<process expanded="true">
<operator activated="true" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer (4)" width="90" x="246" y="34">
<parameter key="time_series_attribute" value="oilHigh"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="94"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="56"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="true" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast (4)" width="90" x="380" y="34">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="input 1" to_op="ARIMA Trainer (4)" to_port="example set"/>
<connect from_op="ARIMA Trainer (4)" from_port="forecast model" to_op="Apply Forecast (4)" to_port="forecast model"/>
<connect from_op="ARIMA Trainer (4)" from_port="performance" to_port="performance"/>
<connect from_op="Apply Forecast (4)" from_port="example set" to_port="result 1"/>
<connect from_op="Apply Forecast (4)" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
<operator activated="false" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer (7)" width="90" x="112" y="238">
<parameter key="time_series_attribute" value="oilHigh"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="94"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="56"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="false" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast (7)" width="90" x="246" y="238">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="in 1" to_op="Optimize Parameters (2)" to_port="input 1"/>
<connect from_op="Optimize Parameters (2)" from_port="performance" to_port="out 1"/>
<connect from_op="Optimize Parameters (2)" from_port="result 1" to_port="out 2"/>
<connect from_op="ARIMA Trainer (7)" from_port="forecast model" to_op="Apply Forecast (7)" to_port="forecast model"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (4)" width="90" x="447" y="595">
<parameter key="attribute_name" value="forecast of oilHigh"/>
<parameter key="target_role" value="regular"/>
<list key="set_additional_roles">
<parameter key="oilHigh and forecast" value="regular"/>
</list>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="ARIMA Predict Low" width="90" x="246" y="748">
<process expanded="true">
<operator activated="true" class="optimize_parameters_evolutionary" compatibility="7.6.001" expanded="true" height="124" name="Optimize Parameters (3)" width="90" x="112" y="34">
<list key="parameters">
<parameter key="ARIMA Trainer.qlithiumorder_of_the_moving-average_model" value="[0.0;100.0]"/>
<parameter key="ARIMA Trainer.plithiumorder_of_the_autoregressive_model" value="[0.0;100.0]"/>
</list>
<parameter key="error_handling" value="ignore error"/>
<parameter key="max_generations" value="%{OptimizeCycles}"/>
<parameter key="use_early_stopping" value="true"/>
<parameter key="generations_without_improval" value="2"/>
<parameter key="specify_population_size" value="true"/>
<parameter key="population_size" value="5"/>
<parameter key="keep_best" value="true"/>
<parameter key="mutation_type" value="gaussian_mutation"/>
<parameter key="selection_type" value="tournament"/>
<parameter key="tournament_fraction" value="0.25"/>
<parameter key="crossover_prob" value="0.9"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="show_convergence_plot" value="false"/>
<process expanded="true">
<operator activated="true" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer (5)" width="90" x="112" y="85">
<parameter key="time_series_attribute" value="oilLow"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="94"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="56"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="true" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast (5)" width="90" x="380" y="238">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="input 1" to_op="ARIMA Trainer (5)" to_port="example set"/>
<connect from_op="ARIMA Trainer (5)" from_port="forecast model" to_op="Apply Forecast (5)" to_port="forecast model"/>
<connect from_op="ARIMA Trainer (5)" from_port="performance" to_port="performance"/>
<connect from_op="Apply Forecast (5)" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="timeseries:arima_trainer" compatibility="0.1.002" expanded="true" height="103" name="ARIMA Trainer (2)" width="90" x="112" y="238">
<parameter key="time_series_attribute" value="oilLow"/>
<parameter key="has_indices" value="true"/>
<parameter key="indices_attribute" value="oilDate"/>
<parameter key="plithiumorder_of_the_autoregressive_model" value="94"/>
<parameter key="dlithiumdegree_of_differencing" value="0"/>
<parameter key="qlithiumorder_of_the_moving-average_model" value="56"/>
<parameter key="estimate_constant" value="false"/>
</operator>
<operator activated="false" class="timeseries:apply_forecast" compatibility="0.1.002" expanded="true" height="82" name="Apply Forecast (2)" width="90" x="246" y="238">
<parameter key="forecast_horizon" value="%{PredictionHorizon}"/>
<parameter key="forecast_only" value="false"/>
<parameter key="add_combined_output" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Applying the ARIMA process to forecast the next 10 values of the time series</description>
</operator>
<connect from_port="in 1" to_op="Optimize Parameters (3)" to_port="input 1"/>
<connect from_op="Optimize Parameters (3)" from_port="performance" to_port="out 1"/>
<connect from_op="Optimize Parameters (3)" from_port="result 1" to_port="out 2"/>
<connect from_op="ARIMA Trainer (2)" from_port="forecast model" to_op="Apply Forecast (2)" to_port="forecast model"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (5)" width="90" x="447" y="748">
<parameter key="attribute_name" value="forecast of oilLow"/>
<parameter key="target_role" value="regular"/>
<list key="set_additional_roles">
<parameter key="oilLow and forecast" value="regular"/>
</list>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Graph Last" width="90" x="581" y="442">
<parameter key="parameter_expression" value="date_after(oilDate, date_set(date_now(), -eval(%{PredictionHorizon})-1, DATE_UNIT_DAY))"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Graph High" width="90" x="581" y="595">
<parameter key="parameter_expression" value="date_after(oilDate, date_set(date_now(), -eval(%{PredictionHorizon})-1, DATE_UNIT_DAY))"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Graph Low" width="90" x="581" y="748">
<parameter key="parameter_expression" value="date_after(oilDate, date_set(date_now(), -eval(%{PredictionHorizon})-1, DATE_UNIT_DAY))"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="join" compatibility="7.6.001" expanded="true" height="82" name="Join" width="90" x="715" y="493">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="inner"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="oilDate" value="oilDate"/>
</list>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<operator activated="true" class="join" compatibility="7.6.001" expanded="true" height="82" name="Join (2)" width="90" x="715" y="595">
<parameter key="remove_double_attributes" value="true"/>
<parameter key="join_type" value="inner"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="oilDate" value="oilDate"/>
</list>
<parameter key="keep_both_join_attributes" value="false"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Oil Forecast" width="90" x="849" y="493">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="oilLow|oilLast|oilHigh|oilDate|forecast of oilLow|forecast of oilLast|forecast of oilHigh"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="false" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="82" name="Generate Report" width="90" x="849" y="595">
<parameter key="report_name" value="Oil Prediction"/>
<parameter key="format" value="HTML"/>
<parameter key="report_to_repository" value="false"/>
<parameter key="html_output_directory" value="/Users/Luc/Dropbox/RapidMiner Prediction Reports"/>
<parameter key="pdf_output_file" value="/Users/Luc/Dropbox/OilPrediction.pdf"/>
<parameter key="html_logo_file" value="/Users/Luc/Dropbox/RapidMiner Prediction Reports/logo.png"/>
<parameter key="html_image_format" value="png"/>
<parameter key="image_col_span" value="8"/>
<parameter key="image_row_span" value="17"/>
<parameter key="page_size" value="0"/>
<parameter key="page_format" value="0"/>
<parameter key="template_type" value="0"/>
<parameter key="pdf_template_file" value="/no file selected"/>
<parameter key="image_template_file" value="/no file selected"/>
<parameter key="image_alignment" value="0"/>
<parameter key="set_background_color" value="true"/>
<parameter key="background_color" value="255,255,255"/>
<parameter key="page_width" value="595"/>
<parameter key="page_height" value="842"/>
<parameter key="top_page_margin" value="36"/>
<parameter key="bottom_page_margin" value="36"/>
<parameter key="left_page_margin" value="36"/>
<parameter key="right_page_margin" value="36"/>
<parameter key="section_one_font" value="courier"/>
<parameter key="section_one_font_size" value="12.0"/>
<parameter key="section_one_font_style_bold" value="false"/>
<parameter key="section_one_font_style_italic" value="false"/>
<parameter key="section_one_font_style_underline" value="false"/>
<parameter key="section_one_font_style_strikethrough" value="false"/>
<parameter key="section_one_font_color" value="0,0,0"/>
<parameter key="section_two_font" value="courier"/>
<parameter key="section_two_font_size" value="12.0"/>
<parameter key="section_two_font_style_bold" value="false"/>
<parameter key="section_two_font_style_italic" value="false"/>
<parameter key="section_two_font_style_underline" value="false"/>
<parameter key="section_two_font_style_strikethrough" value="false"/>
<parameter key="section_two_font_color" value="0,0,0"/>
<parameter key="section_three_font" value="courier"/>
<parameter key="section_three_font_size" value="12.0"/>
<parameter key="section_three_font_style_bold" value="false"/>
<parameter key="section_three_font_style_italic" value="false"/>
<parameter key="section_three_font_style_underline" value="false"/>
<parameter key="section_three_font_style_strikethrough" value="false"/>
<parameter key="section_three_font_color" value="0,0,0"/>
<parameter key="section_four_font" value="courier"/>
<parameter key="section_four_font_size" value="12.0"/>
<parameter key="section_four_font_style_bold" value="false"/>
<parameter key="section_four_font_style_italic" value="false"/>
<parameter key="section_four_font_style_underline" value="false"/>
<parameter key="section_four_font_style_strikethrough" value="false"/>
<parameter key="section_four_font_color" value="0,0,0"/>
<parameter key="section_five_font" value="courier"/>
<parameter key="section_five_font_size" value="12.0"/>
<parameter key="section_five_font_style_bold" value="false"/>
<parameter key="section_five_font_style_italic" value="false"/>
<parameter key="section_five_font_style_underline" value="false"/>
<parameter key="section_five_font_style_strikethrough" value="false"/>
<parameter key="section_five_font_color" value="0,0,0"/>
<parameter key="text_content_font" value="courier"/>
<parameter key="text_content_font_size" value="12.0"/>
<parameter key="text_content_font_style_bold" value="false"/>
<parameter key="text_content_font_style_italic" value="false"/>
<parameter key="text_content_font_style_underline" value="false"/>
<parameter key="text_content_font_style_strikethrough" value="false"/>
<parameter key="text_content_font_color" value="0,0,0"/>
<parameter key="system_fonts" value="false"/>
<parameter key="directory_fonts" value="false"/>
<parameter key="table_column_number" value="16"/>
<parameter key="table_header_color" value="128,128,128"/>
<parameter key="table_row_color_one" value="255,255,255"/>
<parameter key="table_row_color_two" value="192,192,192"/>
</operator>
<operator activated="false" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report" width="90" x="849" y="697">
<parameter key="report_name" value="Oil Prediction"/>
<parameter key="report_item_header" value="%{CurrentDate}"/>
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="Data Table"/>
<parameter key="renderer_name" value="Advanced Charts"/>
<list key="parameters"/>
<parameter key="image_width" value="800"/>
<parameter key="image_height" value="600"/>
</operator>
<connect from_op="Get/Join Data" from_port="out 1" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Start of Trend" to_port="example set input"/>
<connect from_op="Filter Start of Trend" from_port="example set output" to_op="Train until Hold-off" to_port="example set input"/>
<connect from_op="Train until Hold-off" from_port="example set output" to_op="Multiply (3)" to_port="input"/>
<connect from_op="Multiply (3)" from_port="output 1" to_op="ARIMA Predict Last" to_port="in 1"/>
<connect from_op="Multiply (3)" from_port="output 2" to_op="ARIMA Predict High" to_port="in 1"/>
<connect from_op="Multiply (3)" from_port="output 3" to_op="ARIMA Predict Low" to_port="in 1"/>
<connect from_op="ARIMA Predict Last" from_port="out 1" to_port="result 3"/>
<connect from_op="ARIMA Predict Last" from_port="out 2" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Filter Graph Last" to_port="example set input"/>
<connect from_op="ARIMA Predict High" from_port="out 1" to_port="result 1"/>
<connect from_op="ARIMA Predict High" from_port="out 2" to_op="Set Role (4)" to_port="example set input"/>
<connect from_op="Set Role (4)" from_port="example set output" to_op="Filter Graph High" to_port="example set input"/>
<connect from_op="ARIMA Predict Low" from_port="out 1" to_port="result 2"/>
<connect from_op="ARIMA Predict Low" from_port="out 2" to_op="Set Role (5)" to_port="example set input"/>
<connect from_op="Set Role (5)" from_port="example set output" to_op="Filter Graph Low" to_port="example set input"/>
<connect from_op="Filter Graph Last" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Filter Graph High" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Filter Graph Low" from_port="example set output" to_op="Join (2)" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
<connect from_op="Join (2)" from_port="join" to_op="Oil Forecast" to_port="example set input"/>
<connect from_op="Oil Forecast" from_port="example set output" to_port="result 4"/>
<connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<description align="center" color="yellow" colored="true" height="352" resized="true" width="334" x="701" y="22">Process Configuration (training example set, horizon, cycles ARIMA optimization, prediction date)</description>
<description align="center" color="green" colored="true" height="166" resized="true" width="558" x="83" y="212">Select Time Series Scope</description>
<description align="center" color="gray" colored="true" height="141" resized="true" width="551" x="84" y="49">Get source data</description>
<description align="center" color="orange" colored="true" height="481" resized="true" width="278" x="81" y="395">Generate Future Predictions</description>
<description align="center" color="blue" colored="true" height="481" resized="true" width="611" x="422" y="397">Reporting</description>
</process>
</operator>
</process>Really love RapidMiner.
Have a nice weekend.
Greetings,
Luc
3
Answers
Hello @luc_bartkowski - thanks for this. I agree that this is a very frequent use case and also agree that it could be easier. A quick spoiler is that the Time Series Extension is undergoing a complete rebuild (see blog post from 2 weeks ago by @tftemme). That said, I think we can help here consolidate these threads and maybe turn this into a sample for the new extension? If so could you please post (repost?) that data set and we will work on this together.
As for the Yahoo Historical Data issue, yes we have talked about this a lot in this forum. Numerous people have posted alternative solutions (see my KB article about Alpha Venture or posts about using Quandl). Meanwhile we are working on pushing out a more permanent, better solution.
Scott
Personally @sgenzer I am very much looking forward to the rebuilding of the time series extension and the addition of new operators to make things easier, or to fill in gaps in the current offering (R package "forecast", anyone?).
But in the meantime @luc_bartkowski you may find that there is another sample process, which is heavily annotated, that might help you along your way. If you install the series extension, then when you open the "File>New Process" window of RapidMiner, you will be prompted with a series forecasting template, shown here (just scroll down until you see it). I think you will find it helpful.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dear @Telcontar120,
Thank you for your answer but the "Time Series Forecasting" template doesn't predict beyond the dates of the example set either.
Greetings,
Luc
Hello @sgenzer / Scott,
I've managed to reverse engineer the "loop" solution of @Thomas_Ott and build it into my own Times Series Prediction process.
I am "close", but stil "no cigar". ? See the following pictures and the attached XML. The first picture shows my "standard" Time Series Forecasting train/validate/test process. The second picture zooms in on the Loop subprocess.
These processes are based on the Quandl CME_CL1 Crude Oil Futures Continuous Contract 1 CL1 Front Month dataset.
Please note that I added "oil" in front of every attribute name. So attribute Open of this dataset has been renamed oilOpen.
The same for all other attributes: oilDate, oilHigh, oilLow, oilLast, etc.
The Loop subprocess generates an amount of future dates following the last date of the Test example set. The amount is equivalent to the horizon. But for some reason the Loop subprocess doesn't generate a new prediction(label) for every new (future) date. It copies the prediction(label) from the Remember/Recall operators (the last row of the Test example set) and adds this (as a constant) value to every new future date.
It is my understanding that Thomas' Loop subprocess implementation generates a new prediction(label) using the model and puts its value in the attribute "Close". To my opinion the attribute Close doesn't exists in Thomas' Loop subprocess, it should be Close-0 to my humble opinion. So I don't know if this example process that I reused in my process is functioning properly either.
Any help to get rid of this last flaw in my process model is appriciated.
Thanks for the support.
Luc
I went back a while after that original process was posted and fixed it because it wasn't generating the closing values per day correctly. I have to look for it on my other machine.
Hi @luc_bartkowski - OK I spent some time looking at your process. Maybe I'm missing something but where you are "testing" the model you are actually forecasting forward. The output of that Apply Model operator is showing you 10-day-forward predictions of oilLast. Right?
Scott
My model has a process parameter (top right) which sets the horizon.
So I can play around with different horizon options.
This horizon is used in the training/validation process, and also in the test process.
I want to use the same horizon for future predictions.
So yes, if the horizon is set to 10 then I want to forecast the Last value of Oct 8, taking into account that the last date in the training/validate/test example set is Sep 28.
I suspect that the example model of Thomas is working only on horizon = 1. I therefore have altered my model.
My altered model selects the last n values from the test example set and puts it in a "Loop Examples" subprocess.
So the subprocess in "Loop examples" get the values to calculate the prediction(label) for future oilDates.
In the "Loop examples" subprocess I have also managed to alter de Date e.g. oilDate N days ahead. N=horizon again.
But then I'm stuck, don't know what to do / which operators to use, to get the desired predictions for future dates.
Please find the altered model in the following XML.
@Thomas_Ott
Dear Thomas, I agree.
I realized that myself also. You solved another problem, independant of the process.
But still, it was the only template for a solution. I was happy to find any template for a solution regarding the topic.
Dispite all information, toturials, tempates, blogs, videos on the web regarding Time Series Forecasting with RapidMiner you pointed me to a possible solution. Because of your posts and I thank you for that. Again, you're doing a great job, learned a lot from you, thank you.
Best regards,
Luc
@luc_bartkowski Thank you for your kind words. I have a bunch of time series processes tha I should just organize and repost. They are super important because Time Series in RapidMiner is not very organized (as of yet) but the development team and Community have made progress.
Hi Scott,
I know why you are able to predict until Oct 3rd in your picture. That is because you downloaded the Quandl source data yesterday on Oct 4th.
Your source data includes values for oilOpen, etc. on Oct 3rd. That is the reason Oct 3rd, including a valuable prediction, is visible in your picture. But your picture doesn't show predictions beyond Oct 3rd, whatever your horizon is.
I'm sorry but I therefore cannot hit the "Solved" button on this topic.
I'm beginning to suspect that the problem, addressed in this topic, is:
The "Apply model" operator (also) always needs a Label to calculate predictions.
Because such Label is not available for Future dates, the "Apply model" operator will never be able to calculate predictions for future dates.
To illustrate this conclusion take again a look to my first picture in this post. In order to generate this picture I added future dates with fake values ("0" e.g. zero) for all attributes beyond Sep 28, including zero values for the Labels (oilLast) on Sep 29 to Oct 5. The "Apply model" operator uses these future (fake) Labels to predict on these future dates. Therefore all predictions beyond Sep 28 have a value of 7.667, based on a Label with a "0" e.g. zero value for these future dates. As stated before: I suspect that "Apply model" always needs a valuable Label in order to predict.
Either that is the explaination of the problem addressed in this topic or my implementation of "Apply model" is not correct.
If the latter is the case, please send as a reply an example model in XML that implements an "Apply model" operator that will predict beyond the scope of a source data set.
Best regards,
Luc
OK I spent some time on this. Let me know what you think.
Scott
I have done some additional testing.
The model that Thomas repaired regarding Remember/Recall was based on an implementation of shifting dates.
I noticed that RapidMiner uses global implementations of java variables. Macro's aren't storage locations, the're pointers to global variables.
Using that knowledge I changed the dates elsewhere, in front of the Loop operator. These values won't change in Loop because the're global.
I included the results in the next picture. One can change dates, shift that date forwards, backwards, anywhere. The pictures are based on a horizon of 10 days. The Apply model will just use these dates as an ID. I noticed that in my previous topic. But the "Apply model" operator doesn't predict beyond the scope of the source data set, whatever the date of those examples are or will be whitin the process.
Because of the absence of a future Label in the scope of the source data set.
Shifting dates 10 days backwards, Sep 28 becomes Sep 18
Sept 28 becomes Oct 8Example of a glo
Well, the only thing I did in this XML process is to change the source date to my MySQL based examples sets.
As you know that source example set has values untill Sep 28.
These are the results. See the following pictures.
The only result example set in this process is provided by the operator Sort (2).
The scope of that prediction does not go beyond the source data set, in my source data set Sep 28.
Sort (2) result example set.
Hello. So I guess by your posts that you did run the process I built. The predictions for 10 days forward are there in that screenshot - they are just not in new rows. If you look at the column labeled "prediction(10 days forward)", that column represents the predicted price of oilLast 10 days AFTER the date listed in oilDate. So for example, on September 15, prediction(10 days forward)=50.077. Hence this is the prediction for oilLast for 10 days after September 15. By my calculations, this is not Sept 25 because these prices are only listed 5 out of 7 days per week. Hence this is showing that oilLast, according to this model, will be 50.077 on Sept 29 and so on...
Oct 12: 52.473
Oct 11: 52.258
Oct 10: 52.336
Oct 9: 51.892
Oct 6: 50.454
Oct 5: 50.648
Oct 4: 49.682
Oct 3: 49.479
Oct 2: 49.606
Sept 29: 50.077
That's why you see no values in the "10 days forward" column there - it has not happened yet in your data set. Yes I could have spent some time moving all that around so that it actually looks like what I typed above...
Scott
Solved with ARIMA Trainer & Apply Forecast.
Thank you for your support. ?
Dear @luc_bartkowski,
if you have any feedback on the ARIMA operators, please post it here with @tftemme in "CC". We are happy for any feedback on this extension which is work in progress.
Cheers,
Martin
Dortmund, Germany
Very nice @luc_bartkowski!
The nice thing about prediction operators like svm and neural nets is that they are multivariable.
In stock trading terms: Amplitudes of the Moving Average and trading volumes have probably a corrolation.
ARIMA is univariable but the only operator able to predict a real future.
What I am going to do to enable multivariable future predictions is:
To feed the multivariable prediction operator with real multivariable data and adjectently all of their univariable related predictions, the prediction output of an ARIMA model. I will train that model with real data. Yes, therefore I have to wait untill the future is past and I have obtained the labels to train to. Yes, I know, the resulting prediction will have a lag. The label data cannot be newer than now(). We all don't have real multivariable data from the future. But one can optimize a prediction.
What happens if q,d,p used in ARIMA change? Well, I guess that the multivariable prediction operator will get improved data to train its model untill now() with training data for the future minus the horizon. It is and will be always the future we want to predict.We have to make a guess. We ask therefore ARIMA a prediction, it's ARIMA's best guess. The multivariable prediction operator will train on it with a target label until now() aka prediction horizon minus horizon.