Part 2: Setting Data Validity Periods and Manually Clearing Caches
In our previous tutorial, we introduced the caching functions of the Jackhammer Extension and demonstrated how to integrate the Cache operator into your RapidMiner process. Today, we will discuss how to set a validity period for the cached data in order to automatically reload the data once it has expired. We will also show one of two possible ways to manually clear a cache, should you require it.
Recap: The Scenario
Previously, we used the Cache operator to store the data received from our company’s new wind turbine. This way, the employees checking the power output of the turbine will get a quick response, without putting stress on the database each time they send a request. Additionally, we integrated the preprocessing steps into the Cache process, further speeding up process execution.
So far, so good. In this next step, we will set a data validity period to improve process automation. The wind turbine sends new data only once a day at midnight, so no need to access the database each time an employee looks up the stats, but our cache necessarily has to be updated with the first request of each day to make the newest data accessible. And, since we do not want to go through the trouble of manually clearing the cache every single day, we will use the Cache operator’s parameter settings to take this task off our hands.
For other use cases or in special situations as for example an unplanned systems reboot, it might be necessary to manually clear the cache and reload the data. To do so you can either use the “Clear Cache” button of the Cache operator or utilize the Clear Cache operator. In this tutorial, we will cover the former option, and discuss the Clear Cache operator in the following tutorial.
Step 1
We will be continuing to use the process built in the first tutorial. Exit all sub-processes to go the root level, then click on the Cache operator. Have a look at the parameter settings: underneath the “Clear Cache” button, there is a box saying “restrict validity” next to it:
Tick the box to open a field to enter the validity period (in milliseconds). Now, our wind turbine sends new data every 24 hours, thus we enter 86400000 into the appearing box.
Step 2
Run the process and the timer starts. if there is a request within the next 24 hours, the database will not be accessed, but rather the data will be pulled from the cache. When the 24 hours are over, the cache will access the database again to load the fresh data, after which another cycle starts.
There is a slight flaw with this: say the first employee checking the wind turbine data does so at 10 am. This is what sets off the 24 hour period: if an employee checks at 8.30 am on the next day, i.e. before the 24 hours have passed, the data will not be renewed even though theoretically, the turbine has already sent new data. To work around this issue, we will have to set cache dependencies to reload the cache every 24 hours or as soon as its a new day. This takes us into the third tutorial for the caching functions of Old World Computing’s Jackhammer Extension.
Manually Clearing Caches
As mentioned above, there might be reasons for opting to clear the cache manually, e.g. because of an (unplanned) event requiring a reload of the data, perhaps because the data in the back end changed. Please note that changing the subprocess will automatically cause a reload the next time it is executed. To manually clear the cache, simply click the “Clear Cache” button found in the operator parameters:
The Cache is now cleared and your data will be retrieved from the database rather than from the cache when you run the process. N.b. this is useful especially during the process design stage as the Clear Cache button is available only in the studio environment. In other cases you will have to use the Clear Cache operator, which we will cover together with Retrieve Cache in an upcoming tutorial.