Integrating different interval time series data and accommodating for lead/lag time
Hi, new user of Rapidminer here. Only had it for a few days and have already learned more through following the tutorials and trial and error than weeks of trying to do the same with Python, big thanks to Thomas Ott for the series on building AI market models, incredibly helpful. The problem I'm facing is from a lack of knowledge and I thought the easiest way to gain that knowledge is to ask.
I'm trying to build a process which forecasts future foreign exchange values, a pointless endeavour maybe, but it's fun. The time series data I'm working with hasdifferent time intervals - such as end of day data and end of month data. Is there a standard way to put these together? I'm wary of going overkill on the details here. The range would be the same, i.e., Jan 2010 - present, regardless of the periodicity.
Second thing (and I don't know if it is a thing) is that the economic indicators I'm looking at affect (if at all) my label, (i.e., monthly closing price of the eur/usd) on different time-scales. Some are leading, others coincident and lagging. Do I need to tell my process that a certain leading indicator isn't likely to affect a given price for 2 or 3 months? Or that a moving average is a reflection of events that have already passed?
Release dates vs periods covered is also confusing the heck out of me, for example OECD release certain reports roughly 6 weeks after the start of the month they cover (or 2 weeks after it ends if it's easier) so currently data for Feb 17 is out, March's data wont be released until mid-way through April. Are there any steps I need to take creating a process to accomodate for these factors?
Thanks in advance and if I haven't explained something clearly or more details are needed let me know,
Alex
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Hi Alex,
Did you try the Fin and Econ extension? There's a way to do rebasing in that extension. If you haven't already, make sure to download and install the Series extension.
Of course you can always roll up the times into one period, but it will probably be a few operators to do so.
With respect to leading, coincident, and lagging indicators. These are incredibly tricky to apply in practice. I'm usually of the opinion that if a lagging indicator is posted publicly that's lagged 6 months, I put it into the model on the day it was released. From there you can try to forecast the lagging indicator with a process like the one below (you'll have to tweak this process).
The better application is finding out which one, or collection of economic indicators, work better. For that you can do something like Feature Selection, which is another process altogether.
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros>
<macro>
<key>horizon</key>
<value>5</value>
</macro>
<macro>
<key>symbol</key>
<value>XOM</value>
</macro>
<macro>
<key>start_date</key>
<value>2016-01-01</value>
</macro>
<macro>
<key>end_date</key>
<value>2017-03-21</value>
</macro>
</macros>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="false" class="optimize_parameters_grid" compatibility="7.4.000" expanded="true" height="103" name="Optimize Parameters (Grid)" width="90" x="581" y="238">
<list key="parameters">
<parameter key="SVM.kernel_gamma" value="[0.001;1000;6;logarithmic]"/>
<parameter key="SVM.C" value="[0;1000;10;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="112" y="34">
<parameter key="training_window_width" value="20"/>
<parameter key="training_window_step_size" value="1"/>
<parameter key="test_window_width" value="20"/>
<parameter key="horizon" value="%{futureDays}"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="7.4.000" expanded="true" height="124" name="SVM" width="90" x="179" y="34"/>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.4.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
<parameter key="horizon" value="%{futureDays}"/>
<parameter key="main_criterion" value="prediction_trend_accuracy"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log" width="90" x="313" y="85">
<parameter key="filename" value="tmp"/>
<list key="log">
<parameter key="Gamma" value="operator.SVM.parameter.kernel_gamma"/>
<parameter key="C" value="operator.SVM.parameter.C"/>
<parameter key="Forecast Perf" value="operator.Validation.value.performance"/>
</list>
</operator>
<connect from_port="input 1" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="quantx1:yahoo_historical_data_extractor" compatibility="1.0.006" expanded="true" height="82" name="Yahoo Historical Stock Data" width="90" x="45" y="34">
<parameter key="I agree to abide by Yahoo's Terms & Conditions on financial data usage" value="true"/>
<parameter key="Quick Stock Ticker Data" value="true"/>
<parameter key="Stock Ticker" value="%{symbol}"/>
<parameter key="select_fields" value="CLOSE"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="date_start" value="%{start_date}"/>
<parameter key="date_end" value="%{end_date}"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="rename" compatibility="7.4.000" expanded="true" height="82" name="Rename" width="90" x="179" y="136">
<parameter key="old_name" value="%{symbol}_CLOSE"/>
<parameter key="new_name" value="Close"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="238">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Close|Date"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="6.4.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="340">
<parameter key="condition_class" value="no_missing_attributes"/>
<list key="filters_list"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="380" y="34">
<parameter key="window_size" value="6"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="Close"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="380" y="136">
<parameter key="window_size" value="6"/>
<parameter key="label_attribute" value="Close"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="7.4.000" expanded="true" height="68" name="Extract Macro" width="90" x="380" y="238">
<parameter key="macro" value="n_examples"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="generate_macro" compatibility="7.4.000" expanded="true" height="82" name="Generate Macro" width="90" x="380" y="340">
<list key="function_descriptions">
<parameter key="filter_range" value="eval(%{n_examples})-1"/>
</list>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.4.000" expanded="true" height="82" name="Filter Example Range" width="90" x="380" y="442">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="%{filter_range}"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator activated="true" class="remember" compatibility="7.4.000" expanded="true" height="68" name="Remember" width="90" x="514" y="442">
<parameter key="name" value="LastRow"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation For the Masses" width="90" x="514" y="34">
<parameter key="training_window_width" value="8"/>
<parameter key="training_window_step_size" value="1"/>
<parameter key="test_window_width" value="8"/>
<parameter key="horizon" value="%{horizon}"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="7.4.000" expanded="true" height="124" name="SVM (2)" width="90" x="217" y="34">
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="0.01"/>
<parameter key="C" value="1000.0"/>
</operator>
<connect from_port="training" to_op="SVM (2)" to_port="training set"/>
<connect from_op="SVM (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.4.000" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
<parameter key="horizon" value="%{horizon}"/>
<parameter key="main_criterion" value="prediction_trend_accuracy"/>
</operator>
<connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
<connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop" compatibility="7.4.000" expanded="true" height="82" name="Loop" width="90" x="648" y="34">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="macro_name" value="loop_forecasts"/>
<parameter key="iterations" value="%{horizon}"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="7.4.000" expanded="true" height="68" name="Recall" width="90" x="45" y="85">
<parameter key="name" value="LastRow"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="246" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.4.000" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
<list key="function_descriptions">
<parameter key="Date" value="date_add(Date,eval(%{loop_forecasts}),DATE_UNIT_DAY)"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="82" name="Set Role (2)" width="90" x="514" y="34">
<parameter key="attribute_name" value="prediction(label)"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="prediction(label)"/>
</operator>
<operator activated="true" class="replace" compatibility="7.4.000" expanded="true" height="82" name="Replace" width="90" x="782" y="34">
<parameter key="replace_what" value="Close"/>
<parameter key="replace_by" value="$1-"/>
</operator>
<operator activated="true" class="materialize_data" compatibility="7.4.000" expanded="true" height="82" name="Materialize Data (2)" width="90" x="916" y="34"/>
<connect from_port="input 1" to_op="Apply Model" to_port="model"/>
<connect from_op="Recall" from_port="result" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Materialize Data (2)" to_port="example set input"/>
<connect from_op="Materialize Data (2)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.4.000" expanded="true" height="82" name="Append" width="90" x="782" y="34"/>
<connect from_op="Yahoo Historical Stock Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation For the Masses" to_port="training"/>
<connect from_op="Windowing" from_port="original" to_op="Windowing (2)" to_port="example set input"/>
<connect from_op="Windowing (2)" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
<connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Remember" to_port="store"/>
<connect from_op="Validation For the Masses" from_port="model" to_op="Loop" to_port="input 1"/>
<connect from_op="Validation For the Masses" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="231"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0