The core functionality of the extension is the extremely efficient reading of JSON data into one or more tables. To this end, the user creates a RapidMiner process mirroring the JSON structure and specifying in which table and column the data is to be stored. This enables a dynamic and data-driven format design. If the JSON contains relational structures, it is possible to extract multiple tables connected by distinct IDs, ensuring no information is lost when the hierarchical nature of JSON is broken down into tables.
The defined JSON structure can be utilized to process JSON data from various sources: data can either be found in a data table created by another source in the process, in a file represented as a FileObject within RapidMiner, or can come directly from a web service request. In combination with the core operators of RapidMiner, the extension covers virtually all possible sources for JSON data. Accordingly, you can integrate the new operators into every RapidMiner process you need JSON data for.
You can not only import and process JSON with the WebAutomation Extension, it is also possible to generate JSON directly in RapidMiner and use it for further tasks, for example for uploading analytic results. This makes working with web services even more efficient for data scientists.
In order to generate JSON in RapidMiner, we have added the new Write JSON operator to the WebAutomation Extension. You can deliver as many ExampleSets as necessary to it, which will be compiled to JSON in its subprocess using operators such as Insert Array and Insert Properties. The process design mimics the structure of the JSON-code to be generated.
To help you get startet with the extension, every operator comes with help texts with example processes. Additionally, you can go to our help section to read several tutorials explaining the functionalities in greater detail: JSON Parsing & Writing: Tutorials.
The extension includes extended functionality to integrate web services into the process sequence, eliminating numerous shortcomings of the existing web extension.
The extension provides three operators to access web services. Two of these send a single call, one for every row of a given data set. The individual results can be returned as FileObjects, making these operators the suitable choice for the retrieval of binary files such as .zip files and CSVs. Alternatively, the result can be immediately interpreted as JSON, streaming the data while interpreting it, avoiding unnecessary copies and thus reducing memory usage. With the third operator, the results of the request are added as new attributes into the data set, useful especially for data enrichment.
All three operators use connections in which the user cannot only provide authentication details, but also set a Rate Limit. The latter will be adhered to within the whole RapidMiner instance, so as to ensure observance of the rate limit should several processes or process parts be running simultaneously to request data from a server. With this, it is easier to prevent bothersome account blocks and IP bans on limited web services, which would otherwise take a lot of effort to rule out.
Internally, the extension utilizes a powerful modern HTTP library that holds connections and if possible reuses them, significantly reducing response time. With the extension it is now finally possible to dynamically control headers and cookies, as they are returned as data sets and can also be entered as data set for the next request. This enables you to very easily represent any HTTP logic through RapidMiner processes.
List of Operators
- Parse JSON
- Parse JSON from Data
- Parse JSON from File
- Process Object
- Process Array
- Extract Properties
- Extract Scalar
- Commit Row
- Send Request
- Send Request from Data
- Send JSON Request
|Number of Users
|1 named user
|5 named users
* plus VAT
Do you have any questions, criticisms or suggestions about our extension?
Please do not hesitate to contact us.