Caching Data within a Process
Problem
When creating processes you sometimes want to create temporary ExampleSets, that are stored in the repository, so you don't need to re-run longer lasting processes over and over again. This esp. occurs, when you have processes depending on the results of others.
Idea
Create a library process, that only executes a process if its output isn't stored in the repo, yet. Otherwise just read the output from the repository.
Solution
Before we can start creating a process we need to setup our studio to show the "Context View". Therefore head over to "View -> Show Panel" and select "Context".
Overview
Step-by-Step Walkthrough
Usage
To illustrate the usage let's have a look at a sample repository:
Example Repository
Caching configuration
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros>
<macro>
<key>repo_path</key>
<value>../../data/temp/cleaned_data</value>
</macro>
<macro>
<key>path_to_process</key>
<value>../clean_data</value>
</macro>
</macros>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="handle_exception" compatibility="8.0.001" expanded="true" height="82" name="Handle Exception" width="90" x="179" y="187">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve" width="90" x="112" y="34">
<parameter key="repository_entry" value="%{repo_path}"/>
</operator>
<operator activated="true" class="print_to_console" compatibility="8.0.001" expanded="true" height="82" name="Print to Console" width="90" x="246" y="34">
<parameter key="log_value" value="reading from cache"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Print to Console" to_port="through 1"/>
<connect from_op="Print to Console" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="productivity:execute_process" compatibility="8.0.001" expanded="true" height="82" name="Execute Process" width="90" x="45" y="34">
<parameter key="process_location" value="%{path_to_process}"/>
<parameter key="cache_process" value="false"/>
<list key="macros"/>
<description align="center" color="transparent" colored="false" width="126">process to be executed if repo entry is not available</description>
</operator>
<operator activated="true" class="print_to_console" compatibility="8.0.001" expanded="true" height="82" name="Print to Console (2)" width="90" x="179" y="34">
<parameter key="log_value" value="creating cache"/>
</operator>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store" width="90" x="313" y="34">
<parameter key="repository_entry" value="%{repo_path}"/>
<description align="center" color="transparent" colored="false" width="126">save output of process to cache location</description>
</operator>
<connect from_port="in 1" to_op="Execute Process" to_port="input 1"/>
<connect from_op="Execute Process" from_port="result 1" to_op="Print to Console (2)" to_port="through 1"/>
<connect from_op="Print to Console (2)" from_port="through 1" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">try to load data</description>
</operator>
<connect from_op="Handle Exception" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="left" color="yellow" colored="false" height="298" resized="true" width="277" x="80" y="40">#1 path and filename of the temp. data set are set via a macro in the context (View -&gt; Show Panel -&gt; Context)<br><br>#2 If no data can be found under the path of location #1 a process is executed. The path to the process is also defined in the context.</description>
</process>
</operator>
</process>