Caching in RapidMiner using Old World Computing's Jackhammer Extension: Cache Dependencies
Using Macros to Set Cache Dependencies
Recap
The previous two tutorials on using the caching functions of the Jackhammer Extension demonstrated the basic features of the Cache operator and how to integrate it into your processes. For this, we constructed
the scenario that you as the company’s data scientist are tasked with making the data sent by the new wind turbine available to your coworkers. After constructing a basic process, we went into the more advanced functions
like setting data validity periods in our second tutorial, at the end of which we ran into a problem: if the first employee checking the wind turbine data does so at 10 am, but on the next day, an
employee checks it at 8.30 am, i.e. before the 24 hours have passed, the data will not be renewed, even though theoretically, the turbine has already got new data. How to solve
this issue will be the topic of this tutorial. We will use cache dependencies to set our cache to reload as soon as it is the next day. For this, we will be using macros.
Step 1
Open your caching process in RapidMiner and add the Generate Macros operator to it. It is important that the macro operator is executed before the cache. To ensure this, place it in front of the cache operator and make a connection from the right output port of the macro to the left input port of the cache operator. This way, the order is fixed and you can be sure the cache receives the macro. Also note that macros do work without connections, we are only doing this to determine the execution order.
Step 2
In the parameter settings of the Generate Macro operator, click on Edit List and enter what is shown in the screenshot below:
This will cause the operator to generate a macro containing the current date. Click apply.
Step 3
Move to the Cache operator and find the parameter for cache dependencies. Click on the button “Edit Enumeration”:
Step 4
In the window that is now opening, simply enter the name of your macro, in this case “date” and hit “OK”.
Now you are all done – it is as simple as that! As soon as the macro changes, the cache will be cleared and load the new data.