The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Time Series Data Mining"
I have 123 columns X 5,000 rows of financial time series data in an Excel Spreadsheet. I setup a genetic algorithm in preprocessing, then I setup a neural net for learning. I would like to forecast the data. I would alse like to setup self optimization. The data series correlates at times and at other times the data does not correlate. The correlation changes inside the series during the time series. Maybe it correlates for a few weeks to a few years, but then there are periods where the correlation is gone again for a few weeks to a few years. I would like to have the system optimize and adjust itself to these changes and locate and adjust to these patterns. I also only need to run this 1 time per day to forecast the next day's values. In the future, I would like to have it forecast out 4 weeks worth of data. Another goal is to have multiple versions of this system running combinations of different preprocessing and learning algorithms. Then, I would like to have the forecasts exported into an excel spreadsheet for comparison.
How can I set this up?
Thanks,
Amir
I have the following code so far: <operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource" breakpoints="after">
<parameter key="excel_file" value="C:\Users\Amir\Desktop\Rapid Miner\TIPredictTrade.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="label_column" value="12"/>
</operator>
<operator name="GeneticAlgorithm" class="GeneticAlgorithm" expanded="yes">
<parameter key="keep_best_individual" value="true"/>
<parameter key="maximum_number_of_generations" value="50"/>
<parameter key="min_number_of_attributes" value="5"/>
<parameter key="plot_generations" value="50"/>
<parameter key="population_size" value="21"/>
<parameter key="show_population_plotter" value="true"/>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="NeuralNet" class="NeuralNet">
<parameter key="default_number_of_hidden_layers" value="3"/>
<list key="hidden_layer_types">
</list>
<parameter key="training_cycles" value="500"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Users\Amir\Documents\rm_workspace\DOWPredict.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
<parameter key="keep_model" value="true"/>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
</operator>
How can I set this up?
Thanks,
Amir
I have the following code so far: <operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource" breakpoints="after">
<parameter key="excel_file" value="C:\Users\Amir\Desktop\Rapid Miner\TIPredictTrade.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="label_column" value="12"/>
</operator>
<operator name="GeneticAlgorithm" class="GeneticAlgorithm" expanded="yes">
<parameter key="keep_best_individual" value="true"/>
<parameter key="maximum_number_of_generations" value="50"/>
<parameter key="min_number_of_attributes" value="5"/>
<parameter key="plot_generations" value="50"/>
<parameter key="population_size" value="21"/>
<parameter key="show_population_plotter" value="true"/>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="NeuralNet" class="NeuralNet">
<parameter key="default_number_of_hidden_layers" value="3"/>
<list key="hidden_layer_types">
</list>
<parameter key="training_cycles" value="500"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Users\Amir\Documents\rm_workspace\DOWPredict.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
<parameter key="keep_model" value="true"/>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
</operator>
Tagged:
0
Answers
When you ask "How can I set this up" what exactly do you mean by "this" ? But the good news is that there is nothing that you propose that has not already been covered in this forum, mainly also with covering code.
That being said I would strongly advise you to be very clear about what the operators do, and what is implied by their use. For example your code uses XValidation. If you think about it, that means that the model could be trained on data coming from after the period to be predicted. Modeling time series is more complicated than classifying flowers or deciding whether to play Golf, Modeling time series involving animate objects in loopback is more complex still.
So it could be that you are not finding constant patterns because you are not using an appropriate validation technique, such as a sliding window validation, but it could be that you are looking for a constancy that does not exist because humans keep changing their view of the way things work. Personally I lean towards the latter.
For the last few decades unscrupulous mathematicians have made a living out of persuading more ignorant bankers that the mathematics of distributions can model human economic activity. Some even got Nobel prizes, and then went bust in the Long Term Capital Management fiasco. Now we are all sitting in the ruins of that simple stupidity. That presents a tremendous opportunity for Datamining to show its worth, for it offers the opportunity to extract unknown and potentially useful information at a time when all that is available is discredited dogma. But it also offers the same blandishments to the naive as the mathematicians peddled to the bankers, so be careful, very, very careful.