Horizon Windowing Operator Time Series Forecasting
Hello,
I hope that someone can help me by my problem.
My task is to make a sales forecast for different products by means of time series analysis. For this I use the process inspired by Thomas Ott:
process
So I have a question:
By the "Windowing" operator is it possible to chose the horizon with 0 and with this setting I get an accuracy from 0.935 and a meaningful graph.
Can I do that? All the examples I had found used the horizon=1. But when I change to 1 the results are worse and the lines in the graph can not match, because with horizon=1 the prediction value is not in the same row how the correct value.
I hope someone can help me.
Best regards
Tina
(Sorry for my worse English)
.
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Tina Grundmann\Dropbox\3MBACS\Projekt\RapidMiner\daten\trainingsdaten.xlsx"/>
<parameter key="imported_cell_range" value="A1:F66"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="1A3=1-1 Material Verbrauch.true.integer.attribute"/>
<parameter key="1" value="1A3=1-2 Material Verbrauch.true.integer.attribute"/>
<parameter key="2" value="1A3=2-1 Material Verbrauch.true.integer.attribute"/>
<parameter key="3" value="1A3=4-2 Material Verbrauch.true.integer.attribute"/>
<parameter key="4" value="3Z8 Material Verbrauch.true.integer.attribute"/>
<parameter key="5" value="Monat\.KalJahr.true.date_time.id"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="7.3.001" expanded="true" height="82" name="Set Role" width="90" x="45" y="136">
<parameter key="attribute_name" value="Monat.KalJahr"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="1A3=1-1 Material Verbrauch|Monat.KalJahr"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.3.000" expanded="true" height="82" name="Windowing" width="90" x="313" y="34">
<parameter key="window_size" value="2"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="1A3=1-1 Material Verbrauch"/>
<parameter key="horizon" value="0"/>
<parameter key="stop_on_too_small_dataset" value="false"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.3.000" expanded="true" height="124" name="Validation" width="90" x="447" y="34">
<parameter key="training_window_width" value="32"/>
<parameter key="training_window_step_size" value="1"/>
<parameter key="test_window_width" value="32"/>
<process expanded="true">
<operator activated="true" class="neural_net" compatibility="7.3.001" expanded="true" height="82" name="Neural Net" width="90" x="112" y="34">
<list key="hidden_layers"/>
</operator>
<operator activated="false" class="polynomial_regression" compatibility="7.3.001" expanded="true" height="82" name="Polynomial Regression" width="90" x="112" y="136"/>
<connect from_port="training" to_op="Neural Net" to_port="training set"/>
<connect from_op="Neural Net" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="7.3.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<parameter key="horizon" value="1"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="238">
<parameter key="excel_file" value="C:\Users\Tina Grundmann\Dropbox\3MBACS\Projekt\RapidMiner\daten\evaluierungsdaten.xlsx"/>
<parameter key="imported_cell_range" value="A1:F28"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="1A3=1-1 Material Verbrauch.true.integer.attribute"/>
<parameter key="1" value="1A3=1-2 Material Verbrauch.true.integer.attribute"/>
<parameter key="2" value="1A3=2-1 Material Verbrauch.true.integer.attribute"/>
<parameter key="3" value="1A3=4-2 Material Verbrauch.true.integer.attribute"/>
<parameter key="4" value="3Z8 Material Verbrauch.true.integer.attribute"/>
<parameter key="5" value="Monat\.KalJahr.true.date_time.id"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="7.3.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="136">
<parameter key="attribute_name" value="Monat.KalJahr"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="179" y="238">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="1A3=1-1 Material Verbrauch|Monat.KalJahr"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.3.000" expanded="true" height="82" name="Windowing (2)" width="90" x="313" y="238">
<parameter key="window_size" value="2"/>
<parameter key="label_attribute" value="1A3=1-1 Material Verbrauch"/>
<parameter key="stop_on_too_small_dataset" value="false"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="447" y="238">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="581" y="187">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Monat.KalJahr|prediction(label)|1A3=1-1 Material Verbrauch-0"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
<connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
The Horizon parameter is only enabled if you toggle on the Label Attribute parameter. The # you put in for Horizon determines how far in time into your time series data set you select your label to try to forecast. For example, let's take an unwindowed data set.
A Horizon of 0 means you are offset the label by zero time units, this is why you have a high degree of accuracy. You're not really forecasting by just trying to fit the existing time series.
A Horizon of 1 means you offest the label by 1 time units, as you're trying to build predictive model that uses the inputs at time zero but label offset by 1.
Take a look at the screensots I posted here: http://community.rapidminer.com/t5/RapidMiner-Studio/Time-Series-using-Windowing-operator-in-RapidMiner/m-p/31791
0
Answers
But I wonder me, why there is the possibility for h=0. Because there is no forecasting...