RapidMiner

RapidMiner

How to interact with Google Cloud APIs with the Web Mining extension

by on ‎12-20-2016 12:02 PM - edited on ‎12-20-2016 12:03 PM by Community Manager

This is the first of several articles to help people use external APIs from within RapidMiner.  There are some APIs that actually do not need any effort at all because there are extensions that make your life very easy.  A good example is the NamSor extension by Elian Carsenat that will predict gender and nationalities based on first and last names.  I strongly encourage people to take advantage of this resource if you work with names.

 

Unfortunately there are very few APIs that are as easy to use in RapidMiner as NamSor.  As all APIs work differently, I'd like to show how to use some common ones and if you need to use a different one, you can likely figure it out from here.

 

GOOGLE CLOUD API

 

Google has an amazing suite of APIs that are available to developers at relatively low cost:

  • Google Cloud Storage
  • Google Cloud Translation
  • Google Drive
  • Google Maps Directions
  • Google Maps JavaScript
  • Google Picker

Here I am going to show how to use Google Cloud Translation to take text and use the Google Translate API to detect the language.  You can of course change this to whatever you want.

 

1. You will need to create a Google Cloud developer account to get an API key.  You do this on https://cloud.google.com.  The key should look like a long string of alphanumeric characters.  Keep this key secure as it is the way Google authenticates and allocates the billing.

 

2. If you have not already done so, download the Web Mining extension in RapidMiner Studio.

 

3. Build a process that sends a text attribute to the Enrich Data by Webservice operator (found in the Web Mining extension) and then connect to the results.

 

4.  The only hard part here (and the only thing that changes from API to API) is how you set up this operator.  For Google Cloud APIs, you will set it up like this:

 

 

The url is cut-off here but is https://translation.googleapis.com/language/translate/v2/detect?key=<your Google Cloud key>.

 

Note I am using POST rather that GET requests.  This is due to the character limit on GET requests and, most likely, your text will exceed this character limit.  Also note that I am putting the API key in the url, rather than in the header of the request ("request properties").  Usually you can do this either way, but sometimes it does not work in the header.  Go figure.

 

 

 

 

 

In the body, you will create this small JSON file (assuming your attribute is named "text"):

 

{
'q': '<%text%>'
}

 

In the jsonpath queries, you select which part you want.  For Language Detection, you would enter $..language as the query expression.  You can name the attribute anything you like.

 

5. Run your process.  It should work nicely UNLESS your data, like mine, has strange values in it and hence at some point will cause an error.  You will want to skip over this example and keep going.  Otherwise if you have 10,000 examples and your API works until the 9996th example and then finds an error, you will lose all of those API results (but still pay for them).  So it is more prudent to do something like this:

 

 

That's about it.  For reference, here are two other useful applications, and the changes you would need to make:

 

To translate text from one language to another

 

url: https://translation.googleapis.com/language/translate/v2?key=<your Google Cloud key>

body:

{
'q': '<%text%>',
'target': 'en',
'format': 'text'
}

 

jsonpath query: $..translatedText

 

To calculate the driving distance between two addresses (I do this via a GET request because the text is short)

 

url: https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&origins=<%LocationAddress%>&d...>

request properties:        key           <your Google Cloud API key>

query type: XPath

attribute type: nominal

xpath query: //distance/text/text()

 

Happy Googling.