Part 3: Using Macros to Set Cache Dependencies
The first tutorial on using the caching functions of the Jackhammer Extension demonstrated the basic features of the Cache operator and how to integrate it into your processes. For this, we constructed the scenario that you, as the company’s data scientist, are tasked with making the data sent by the new wind turbine available to your coworkers. After constructing a basic process, we turned to one of the more advanced functions – setting data validity periods – in our second tutorial, at the end of which we ran into a problem: if the first employee checking the wind turbine data does so at 10 am, but on the next day, an employee checks it at 8.30 am, i.e. before the 24 hours have passed, the data will not be renewed, even though theoretically, the turbine has already got new data. How to solve this issue will be the topic of this tutorial. We will use cache dependencies to set our cache to reload as soon as it is the next day. For this, we will be using macros:
Open your caching process in RapidMiner and add the Generate Macros operator to it. It is important that the macro operator is executed before the cache. To ensure this, place it in front of the cache operator and make a connection from the right output port of the macro to the left input port of the cache operator. This way, the order is fixed and you can be sure the cache receives the macro. Also note that macros do work without connections, we are only doing this to determine the execution order.
In the parameter settings of the Generate Macro operator, click on Edit List and enter the following: date_str_custom(date_now(),”yyyy-MM-dd”)
This will cause the operator to generate a macro containing the current date. Click apply.
Move to the Cache operator and find the parameter for cache dependencies. Click on the button “Edit Enumeration”:
In the opening window, simply enter the name of your macro, in this case “date”, and hit “OK”.
Now you are all done – it is as simple as that! As soon as the macro changes, the cache will be cleared and load the new data. Depending on your use case, it might be more helpful to restrict the data validity period or to use cache dependencies to automate cache updates – just play around with it! You can also not just use dates, but all kinds of macros.