RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance


Tutorial for the "JSON processing with jq" extension

BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 485   Unicorn
edited March 12 in Knowledge Base
You might have some complex JSON documents that you'd like to process in RapidMiner. This tutorial demonstrates the functionality of the JSON processing with jq extension.
The first step is to install the extension from the Marketplace if you don't have it yet. Click the Extensions/Marketplace menu entry, and in the Marketplace window, enter JSON in the search box. Select the extension for installation, and let Studio restart itself when the installation is done.

Let's play with a publicly available data set from the Vienna Open Data server.
Here's a list of playgrounds in the city of Vienna, with some attributes and even geocoordinates:
When you go there, you see a list of documents in different formats, we're obviously interested in the JSON document, with this URL.

The simplest RapidMiner process to get the contents of this URL and process them with jq looks like this:

In Open File, you set the resource type to URL, and paste the URL of the JSON resource. In Read Document uncheck "extract text only", as we don't want to change the input. Then add a Process Document with jq operator and connect its input and output ports.

By default, Process Document with jq is set up for JSON output, indenting (formatting) the resulting JSON document, and with the simplest jq expression ".", which just copies (and formats) the incoming document.
So we get the first result from the process:

The document contains a kind of a header (type: FeatureCollection and totalFeatures from the GeoJSON standard), and an array of "features" (the playgrounds).

We're interested in the name (ANL_NAME), the playground details (SPIELPLATZ_DETAIL), and the geocoordinates of every playground.ย 
To develop the jq expression, we go to jqplay.org and paste the JSON data into JSON field. The we interactively begin to develop the expression to select the data we want.

The first step is ".features". This selects the features array (discarding the header), and returns every element as an object of one large array.
If we change this to ".features[]", we get a list of different objects, which is better for further processing.
In jq, we use the pipe symbol | for processing steps. Now we list the elements we want after a pipe. The expression is:ย 
.features[] | [.properties.ANL_NAME, .properties.SPIELPLATZ_DETAIL, .geometry.coordinates[0], .geometry.coordinates[1] ]

This gives us a nice flat structure that we can easily process with RapidMiner, especially if we let the operator convert it to CSV.ย 

(Uncheck "first row as names" in the Read CSV operator, and change the separator to comma.)
The result is a normal RapidMiner example set:

Check out the documentation of jq if you need to process even more complex documents. jq offers additional functionality like extracting a variable length array of values (like tags) to a table structure, or counting elements, regular expression replacements, etc.ย 


  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 485   Unicorn
    Note: It seems that there are problems using the extension with the Free edition of RapidMiner.ย 
    In paid/trial/educational it might be necessary to set "Grant additional permissions to unsigned extensions" in the Preferences on the Startup tab.
  • pacogpacog Member Posts: 2 Newbie
    does it currently work? I alway have an error while reading json files or URLs:

    "The Execute jq script operator in this process failed"ย 

    Just used the default values with operators "open file" and "read document"
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 485   Unicorn

    It works in my Studio 9.7.002 with the aforementioned limitation (Grant additional permissions ...).

    The library I'm using is doing some operations that RapidMiner doesn't like.ย 

    If the extension doesn't work for you, you can right-click the operator, "Open defining process", and use it as a subprocess with macro parametrization. You remove the first five operators that create macros from the operator parameters, and create the same macros in the process context. Then you save the process and use it in other processes by specifying the parameters with the Macros button.

  • pacogpacog Member Posts: 2 Newbie
    magic!! without doing anything but restart several times rapidminer, now it works!!! I also have 9.7.002 version.

    thanks anyway!
Sign In or Register to comment.