In the previous posts, we first discussed the basic functions of the WebAutomation Extension and then demonstrated how to extract not only one, but multiple, relational, example sets from just one JSON string. As mentioned there, we have one more feature to show: extracting arrays of scalar values. If you like, you can also open the tutorial process in RapidMiner, find it under Partner Materials - Old World Computing in the Community Samples Repository.
As we will continue with our example data from before, let’s first have another look at the JSON:
Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Parsing JSON with OWC's WebAutomation Extension: Extracting Arrays of Scalar Values
Jana_OWC
Member, KB Contributor Posts: 14 Contributor II
Hi there,
So far, we have discussed extracting the properties of the books array – title, subtitle, language and so on. We also covered how to extract the information of the nested authors array. As you can see above, both the books and the authors array, however, are arrays of objects. Having a closer look at the JSON, you will see that there is one array left which we have not yet processed: keywords. You will also see that keywords – as opposed to authors – is an array of single string values and not of nested objects. In the following, we will demonstrate how to extract the information into a third table.
Running the process, you should now get three individual example sets: one showing the properties of the books array, one with the authors’ names, and a third one with keywords assigned to the books. The keywords array process is nested within the Process Object operator, which, as you might remember from the previous tutorials, we have set to assign an ID to each JSON object. Thus, the new third ExampleSet will also include an ID corresponding to the other ExampleSets, making relational conclusions possible. (If your data already includes an ID, go here to read up on how to use it as the connecting element).
I'm back with another tutorial, concluding the posts on how to parse JSON in in RapidMiner by using Old World Computing's WebAutomation extension. I hope you found the tutorials useful, if there are any further questions, don't hesitate to ask! Also, if you are using any of our extensions and would like to see a tutorial about certain features, feel free to send me a message here, or contact us on Twitter or LinkedIn.
First, here is a reminder of how the inside of the Process Array operator should be looking right now: as we have discussed before, the structure of the process mirrors the original JSON structure. Therefore, we will continue to work on the level of the books array.
We will now add another Process Array operator, connecting it to Multiply and the third Parse Specification port on the right – remember to also make the new connections on all higher levels and between the Process Object and Parse operator in order to receive your ExampleSet.
Going into the operator, we will build a similar sub-process to the ones we are using to extract the authors and the other properties. The only difference is that instead of the Extract Properties operator, we will now use the Extract Scalar operator provided by the WebAutomation extension. Enter an attribute name – Keywords – and select the correct attribute type, in this case polynominal. Do not forget to add a Commit Row operator to the sub-process to express that every entry should be represented by a row:
Running the process, you should now get three individual example sets: one showing the properties of the books array, one with the authors’ names, and a third one with keywords assigned to the books. The keywords array process is nested within the Process Object operator, which, as you might remember from the previous tutorials, we have set to assign an ID to each JSON object. Thus, the new third ExampleSet will also include an ID corresponding to the other ExampleSets, making relational conclusions possible. (If your data already includes an ID, go here to read up on how to use it as the connecting element).
Summary
This concludes our tutorials for JSON parsing with the new WebAutomation Extension. You should now be able to use this powerful tool to your advantage, increasing efficiency greatly. For further help with the extensions you can also check the tutorials found in the help tab in RapidMiner Studio when selecting one of the extension’s operators. Also be sure to have a look at the other useful functions, such as the JSON request operators, fetching the data directly from a web service.
Tagged:
3