Tutorial: Write JSON with the WebAutomation Extension

Tutorial: Write JSON with the WebAutomation Extension

In this tutorial, we would like to demonstrate the new Write JSON operators of our WebAutomation Extension for RapidMiner, enabling you to build JSON code directly in RapidMiner and export it. This feature is useful for scenarios where you need to upload the results of your analyses to a web service.

First, we’ll quickly cover the basic principles:

  • The structure you build inside the subprocess of the main Write JSON operator mimics the nested JSON structure, defined by operators such as Insert Object, Insert Array and Insert Properties or Insert Value
  • Usually, you will start out with Insert Object, which can have n subentries inside an array. Inside the array, you define which data is written in the resulting JSON by using the Insert Properties and Insert Value operators
  • You feed the data to the operator via the input ports, to which you can connect as many data tables as you require
  • To indicate which data rows of the tables belong together, all tables need a column that will be used as identifier

We will now explain how to use the Write JSON operator in greater detail using an example. To follow along, you can download this process here. If you are also interested in how to parse JSON with our WebAutomation Extension, have a look at our other tutorials concerning the extension.

1.      Last Things First: The JSON Code

For this tutorial, we will show you the result first, so that it will be easier to trace the steps. We will construct a JSON containing information about books: titles, authors, publishers, keywords and so on. For this, we have prepared three ExampleSets with the required infos: one for books, one for authors, and one for keywords. The identifier across all tables is the ISBN. With the help of the identifier, the operator can correctly assemble the corresponding rows into JSON.

This is what we want to end up with:

As you can see, the first array, books, holds information such as title and subtitle, and has two nested arrays: authors, which is also an array of objects, and keywords, which holds scalar values.

2.      The Main Operator: Write JSON

To start out, find the Write JSON operator in the operator list and add it to the process panel. Take a look at it:

In the parameters, you can choose to make the JSON readable for humans, which will cause the JSON structure to be apparent in the resulting file. When not enabled, the code will be written in consecutive lines, making it difficult for a human to discern the nested structure. If you are sending the JSON straight to a web service, you might not need to be able to read it yourself and can leave the box unchecked. For this tutorial we will activate it, so that we can get a better look at our results.

The operator shows one data input port, which will reproduce when used, enabling you to connect as many data tables to the operator as you need. You will receive the JSON in form of a file object at the output port.

Now, we connect the data to the input ports of the Write JSON operator:

3.      Inside the Subprocess: Insert Object

As mentioned above, the subprocess of the Write JSON operator is where the structure of the JSON is defined. The design here corresponds to that of the resulting code. The next steps will illustrate this in more detail.

To start out, enter the subprocess and add an Insert Object operator.

Notice the three connections coming in, corresponding to the three data sets we added before. Always make sure you connect all your incoming data!

4.      Going Deeper: Insert Array

The next step is to add an Insert Array operator. Again, connect all incoming data and then take a look at the parameters. Here, you will first need to enter specific info regarding your data set.

Scrolling back up to the JSON code we are looking to write, you will see that our outer array is “books”, so that is what we need to enter for property name. The array type is objects, as this array contains further objects, but depending on your data you can also select values or arrays as type. The operator also asks for the ID attribute name (in this case we’re using the ISBN), which, as discussed above, is used to bring the corresponding rows together correctly. With meta data synchronization activated, you should be able to see the attributes appearing in the ExampleSets you fed to the operator. If more than one attribute identifies the same array entry, you can use the button below this field to list them.

5.      Inside the First Array

For the next step we have to think about our data and the structure of the JSON code we want. To illustrate the design principles better, we’ll show you the end result again:

You can see two things in this section of the code: An object of the “books” array has a few properties, which here are highlighted by the yellow box. It also has two more nested arrays, in blue: authors and keywords. In the RapidMiner process, this is represented as follows:

The principles of how to construct the process to achieve the correct structure for the JSON code you are writing should become clear here – the properties on the level of the outer array are inserted via the Insert Properties operator, while the nested arrays are achieved by adding two more Insert Array operators on this level, one for authors and one for keywords. In the parameters, you will again have to set the property name, array type, and ID attribute name as we did for the first, outer array. We’ll get to that later, taking a special look at the keywords array, as it is an array of the type scalar values.

6.      Insert Properties

In the screenshot above, you can also see the parameters of the Insert Properties operator. It is rather straightforward: click on the button next to “insert properties” to provide a list of the attributes from the ExampleSets that you’re wishing to insert into the JSON. In this case, the list should look like this:

For each attribute, you have to choose a property name for the JSON as well as indicate whether it is a string, number, or boolean. Once you have completed your list, hit apply. Repeat this process for other arrays nested in this one. In our example, we would now enter the subprocess of the authors array, add another Insert Properties operator there, and fill the list with the attributes found in our data. If you have time information in your data, be sure to set your desired date format and time zone in the parameters.

7.      Inserting Arrays of Scalar Values

Lastly, we’ll discuss the keywords array, as this is an array of the type scalar values and as such, has slightly different settings. To insert scalar value arrays, simply select this option from the dropdown list in the parameters of the Insert Array operator.

Then, to actually insert the values, choose the Insert Value operator instead of Insert Properties when inside the array:

In its parameters, select the correct attribute, in this case Keyword, and indicate the format – string, number, or boolean – as you did with the Insert Properties operator.

8.      Finishing Up

For this example, this is as complex as we’ll go, so we can now make sure all necessary connections are made. The Write JSON operator will create a file for us, and for the purpose of this tutorial, we will add a Read Document operator after it to be able to have a look at the JSON. In a realistic scenario, this is where you could add the Send Request operator also found in the WebAutomation extension in order to send your JSON to a web service.

Executing the process, we get the following result:

Looks like it worked!

We hope, this tutorial helped you in understanding how to write JSON using our WebAutomation Extension for RapidMiner. If you have any further questions, contact us here or message us on Twitter or LinkedIn!