"Financial Time Series Prediction"
Hi I am a new user of rapidminer. I would like to make a Financial Time Series to forecast the future values of stock market performance (Mainly the Hang Seng Index of HK). I have tried to study this forum to get my answers, however, I still got confused even after I have read the other posts and the video tutorial made by Thomas Ott (#8-#10 Tutorials about Financial Time Series)
Here are the questions that have been confused me for a few days:
1. Some posts in the forum mentioned that the use of windowing could help on generating multivariate datas, but how could this function help me to predict off-data values (For example, I have the data of Day 1 to 10, and I would like to predict the values from Day 11 to 20) ?
2. Other than windowing, what operators should I put into my process that could help me get the off-data predicted values?
3. Is there anyway to link up the optimization operator (Grid) together with the SVM Operator in the Sliding Window Vlidation Operator? In my current model, I have to first run the optimization operator, getting the "optimized" values and put them into the SVM operator which is inside the validation operator, then I have to run the process again. Would there be a more convenient way to organize my process?
4. I am quite suprised that the default setting of SVM (Kernal type: Dot) is quite accurate in predicting the stock market performance, but when I tried to use the optimization operator to optimize the "C" and "gamma" values and use them to test my testing datas, usually resulted in low accuracy. May I know if there were any better operators to predict the market performance?
Thanks for your patience for reading all my questions. I am a student who is majoring in social science, so I am not quite familiar about the mining process... But I am willing to learn
Below is the XML of my current model, it would be great if you can give me some ideas on how to improve it Thanks!
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.2.002" expanded="true" height="68" name="Read Excel" width="90" x="45" y="187">
<parameter key="excel_file" value="/Users/TszShingLusaion/Desktop/HSBC(Training).xls"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.2.002" expanded="true" height="82" name="Set Role" width="90" x="179" y="187">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="Close" value="label"/>
</list>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.2.000" expanded="true" height="82" name="Windowing" width="90" x="313" y="187">
<parameter key="window_size" value="1"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="Close"/>
</operator>
<operator activated="true" class="read_excel" compatibility="7.2.002" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="493">
<parameter key="excel_file" value="/Users/TszShingLusaion/Desktop/HSBC(Out_exp).xlsx"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.2.002" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="493">
<parameter key="attribute_name" value="Close"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="Date" value="id"/>
</list>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.2.000" expanded="true" height="82" name="Windowing (2)" width="90" x="313" y="493">
<parameter key="window_size" value="1"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="Close"/>
<parameter key="horizon" value="0"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.2.000" expanded="true" height="124" name="Validation" width="90" x="447" y="187">
<parameter key="training_window_width" value="50"/>
<parameter key="test_window_width" value="50"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="7.2.002" expanded="true" height="124" name="SVM (2)" width="90" x="112" y="34">
<parameter key="kernel_gamma" value="0.001"/>
<parameter key="C" value="1.584893192461114"/>
</operator>
<connect from_port="training" to_op="SVM (2)" to_port="training set"/>
<connect from_op="SVM (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.2.002" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="7.2.002" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="7.2.002" expanded="true" height="82" name="Apply Model (2)" width="90" x="581" y="289">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="legacy:write_model" compatibility="7.2.002" expanded="true" height="68" name="Write Model" width="90" x="581" y="442">
<parameter key="model_file" value="/Users/TszShingLusaion/Desktop/successprediction.mod"/>
</operator>
<operator activated="true" class="read_excel" compatibility="7.2.002" expanded="true" height="68" name="Read Excel (3)" width="90" x="45" y="34">
<parameter key="excel_file" value="/Users/TszShingLusaion/Desktop/HSBC(Training).xls"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.2.002" expanded="true" height="82" name="Set Role (3)" width="90" x="179" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="Close" value="label"/>
</list>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.2.000" expanded="true" height="82" name="Windowing (3)" width="90" x="313" y="34">
<parameter key="window_size" value="1"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="Close"/>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="7.2.002" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="447" y="34">
<list key="parameters">
<parameter key="SVM.C" value="[0.001;100000;10;logarithmic]"/>
</list>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="7.2.002" expanded="true" height="124" name="SVM" width="90" x="112" y="34"/>
<operator activated="true" class="apply_model" compatibility="7.2.002" expanded="true" height="82" name="Apply Model (3)" width="90" x="246" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="7.2.000" expanded="true" height="82" name="Performance" width="90" x="380" y="34"/>
<connect from_port="input 1" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_op="Apply Model (3)" to_port="model"/>
<connect from_op="SVM" from_port="exampleSet" to_op="Apply Model (3)" to_port="unlabelled data"/>
<connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance"/>
<connect from_op="Performance" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
<connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 4"/>
<connect from_op="Apply Model (2)" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Write Model" from_port="through" to_port="result 3"/>
<connect from_op="Read Excel (3)" from_port="output" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Windowing (3)" to_port="example set input"/>
<connect from_op="Windowing (3)" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 5"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
</process>
</operator>
</process>
Best Answers
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Hi Richard,
I suggest you check out Chapter 10 of Predictive Analytics and Data Mining by Vijay Kotu and Bala Deshpande, it explains on how to do forecast multiple time periods ahead. So to answer your specific questions, see below:
1. You will need to use Windowing operators for both your training and testing sets, the book referenced above discusses how to build your training and testing set.
2. This depends, if you're trying to predict closing prices of a market index maybe you want to use a Lag Series operator or a Moving Average operator. I use those operators quite a bit for volatitly forecasting.
3. Yes, you sure can. Just embed the Sliding Window Validation operator inside the Grid Optimization and you can tune the training/testing windows and width. I do that quite a bit. The window size and steps make all the difference in a more accurate model.
4. I usually use a SVM with an RBF kernel and then fine tune the Gamma and C in a Grid Optimizaiton operator. I find that this gives me a more stable model.
0 -
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Hi Richard,
To use the Lag Series operator you need to just edit the parameter field and select with attribute (column) you want to lag and by what value. So If I want to lag an attribute called S&P500_weekly by 2 weeks, then I'd select that column in the Lag Series operator and enter the value 2.
The Moving Average operator is similar, just select the attribute you want to calculate the moving average on and select the "time window." If your data is in days and you want a 20DMA, just enter 20 for the time window.
FYI, here's a great graphic on how Gamma and C interact.
Gamma vs C - SVM
1
Answers
Thanks Thomas for your suggestions! I will work on this direction and see what I can find
May I ask for some more instructions on how to implement the lag operator / moving average operator into the process please? I am quite lost about these terms Thanks
Thanks TBone !
I have made a great improvement in building the model, however, a new question appeared.
The following is quoted from Chapter 10 of the book you recommended to me.
"...But before we start the looping, we need to store the last forecasted row in a separate data structure. This is accomplished by the macro titled Extract Example Set. The Filter Example operator simply deletes all rows of the transformed data set except the last forecasted row. (P.324)"
How may I set the Filter Example operator to "delete all rows of the transformed data set except the last forecasted row"? I guess it could be related to the "custome filter", but I got lost right after I clicked in...
Another Question is in the "Adjust Date" operator, how may I set the adjustment in order to make some new dates? Thanks!