The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problem with Store / Retrieve
hughesfleming
Member Posts: 14 Contributor II
I am reading two CSV files that are generated by an application that sets up training data and out of sample data and these CSV's are updated daily. When I start the process in rapidminer, I store these CSV's to the repository with a file name and then retrieve them. The CSV's are coming in properly with the read operator but sometimes the retrieve operator brings in the previous days stored data and not the current day's data. I don't remember having this problem under OSX and reading the CSV's from a network drive. I am now having this problem running my process under Windows 7 64bit and I have to run the process a couple of times before it brings in the correctly stored CSV. I am at a loss as this used to be straight forward. Anyone have any ideas?
Many thanks,
Alex Fleming
Many thanks,
Alex Fleming
0
Answers
can you please attach sample processes where you store and retrieve the data?
Which repository type are you using? Is it a local repository, or a remote repository on a RapidAnalytics server?
Best,
Marius
regards,
Alex
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.017">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
<process expanded="true" height="633" width="1224">
<operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV" width="90" x="179" y="120">
<parameter key="csv_file" value="C:\MT4-2\Broco Trader\experts\files\EURUSDTD,D1.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Date.true.date_time.attribute"/>
<parameter key="1" value="Time.false.binominal.attribute"/>
<parameter key="2" value="Open.true.real.attribute"/>
<parameter key="3" value="High.true.real.attribute"/>
<parameter key="4" value="Low.true.real.attribute"/>
<parameter key="5" value="Close.true.real.attribute"/>
<parameter key="6" value="ACLV.true.real.attribute"/>
<parameter key="7" value="Range1.true.real.attribute"/>
<parameter key="8" value="Range2.true.real.attribute"/>
<parameter key="9" value="Range3.true.real.attribute"/>
<parameter key="10" value="Range4.true.real.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store" width="90" x="313" y="120">
<parameter key="repository_entry" value="Training Data"/>
</operator>
<operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV (2)" width="90" x="179" y="210">
<parameter key="csv_file" value="C:\MT4-2\Broco Trader\experts\files\EURUSDNN,D1.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Date.true.date_time.attribute"/>
<parameter key="1" value="Time.false.binominal.attribute"/>
<parameter key="2" value="Open.true.real.attribute"/>
<parameter key="3" value="High.true.real.attribute"/>
<parameter key="4" value="Low.true.real.attribute"/>
<parameter key="5" value="Close.true.real.attribute"/>
<parameter key="6" value="ACLV.true.real.attribute"/>
<parameter key="7" value="Range1.true.real.attribute"/>
<parameter key="8" value="Range2.true.real.attribute"/>
<parameter key="9" value="Range3.true.real.attribute"/>
<parameter key="10" value="Range4.true.real.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store (2)" width="90" x="313" y="210">
<parameter key="repository_entry" value="OutofSample Data"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve (2)" width="90" x="514" y="210">
<parameter key="repository_entry" value="OutofSample Data"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="514" y="120">
<parameter key="repository_entry" value="Training Data"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Store (2)" to_port="input"/>
<connect from_op="Retrieve (2)" from_port="output" to_port="result 2"/>
<connect from_op="Retrieve" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
so basically you now got your processes working?
The process you posted above looks fine. However, since the Retrieve operators don't have an input port, the execution order is sometimes a bit "random" (in fact it is deterministic, but might seem random depending on the order you dragged the operators on the process ), meaning that the Retrieve operators might be executed before the Store operators, which might have caused the seemingly strange behaviour of your process. To control the operator execution order, click the blue up-down-arrow icon on the top right of the process pane. Hint: to make an operator the first one to be executed, right click on it and select "bring to front".
Best,
Marius
Many thanks,
Alex
To schedule the process you might want to have a look at our RapidAnalytics server. If you think that's overkill, just create a cron job (if you are on unix) to call RapidMiner and specify the process you want to run as command line argument.
Best, Marius