RapidMiner

Connect Google BigQuery to RapidMiner using the Simba JDBC driver

by RMStaff on ‎08-14-2017 10:33 AM - edited Thursday by RMStaff

 

Step 1 ) Download Driver

BigQuery2-01.pngFirst, we need to download the Simba JDBC driver (NOT the ODBC) from

here: http://www.simba.com/drivers/bigquery-odbc-jdbc/

 

Downloading and unzipping the SimbaJDBCDriverforGoogleBigQuery42_1.1.0.1000.zip file will give you the following files:

Unzip it into a folder for good housekeeping.1.jpg

 

Step 2) Place Drivers in RapidMiner JDBC drivers folder

Navigate to: C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc and copy the folder with the above files there. It should look like this:

 

2.jpg

 

At this point, we need to shut down RapidMiner Studio.

Then we need to navigate to the .RapidMiner folder which is under C:\Users\<your name>\.RapidMiner and locate the jdbc_properties.xml file.

Make a backup copy of the jdbc_properties.xml file and then open it in a text editor.

 

Unless you have other drivers installed, it will either be empty or look like this:

 

<?xml version="1.0" encoding="UTF-8"?>

<drivers>

PASTE HERE

</drivers>

 

Then you can copy the definitions below and paste them replacing “PASTE HERE”. Make sure the file path matches your actual .RapidMiner…lib path where the .jar files are saved and you will have something looking very much like this:

 

<?xml version="1.0" encoding="UTF-8"?>

<drivers>

<driver urlprefix="jdbc:bigquery://" name="BigQuerySIMBA" drivers="com.simba.googlebigquery.jdbc42.Driver" driver_jar="C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\GoogleBigQueryJDBC42.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-services-bigquery-v2-rev320-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-jackson2-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-oauth-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\jackson-core-2.1.3.jar" defaultport="443" dbnameseparator=""/>

</drivers>

 

There is wrapping in the format here but if you open the jdbc_properties.xml file with notepad, it should all be in a single line. There is no need for any special arrangement of the text, so, it may be better to keep text wrapping to a minimum to avoid problems.

 

 Step 3)Configure New JDBC Driver connection type

It is now time to start RapidMiner Studio and perform the following actions:

  1. Open Connections > Manage Database Drivers > Add to add a driver
  2. Name” your new connection something like BigQuerySimba
  3. Set the “URL Prefix” to: jdbc:bigquery:// , “Port” 443 and leave the “Schema Separator” blank.
  4. The “Driver Class” should be: com.simba.googlebigquery.jdbc42.Driver
  5. And the “Jar file” field should contain the following list in one line:

C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\GoogleBigQueryJDBC42.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-services-bigquery-v2-rev320-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-jackson2-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-oauth-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\jackson-core-2.1.3.jar

 

You may recognise this comma separated list from above, again, make sure that your file paths are the ones you used.

Click “Save” and close.

3.jpg

 

Step 4) Create Database Connection (Please see step 5 for preferred connection technique)

Now we need to create a database connection.

 

Navigate to Connections > Manage Database Connections and create a new connection called something relevant; it does not need to have the same name as the driver connection set up above. Here, it is named “ConnectToBigQuerySIMBA”

Choose the BigQuerySimba from the drop-down for “Database system”

For “Host” enter this string:

https://www.googleapis.com/bigquery/v2:443;ProjectId=<your-project-name>;OAuthType=1

for more information on this, please read the Simba documentation pdf file that came with the driver: “Simba JDBC Driver for Google BigQuery Install and Configuration Guide”

Port” and “Database scheme” remain empty.

For “User”, you will have to enter the user email from Google and the associated password and hit “Save”.

 

4.jpg

The setup is now complete for testing purposes.

 

When you click “Test” you will be prompted as follows for a key:

5.jpg

Copy the URL, paste it in your browser and navigate to it. You will be asked to approve certain permissions and then you will be given a one-time key that looks like this:

googlekey.png

Once you copy this code, you paste it into the text box replacing the URL there as so:

2017-09-21 11_54_42-RapidMiner Studio Knowledge Base.png2017-09-21 11_54_42-RapidMiner Studio Knowledge Base.png

 

And your connection should succeed:

8.jpg

 

Step 5) Using a Google Service Account (Preferred)
You can configure the driver to authenticate the connection with a Google service account. The service account can handle the authentication process so that no user input is required.


You must provide a Google service account email address and the full path to a private key file for the service account. You can download the private key file from the Google API console web page. For more information about OAuth authentication using a service account, see "Using OAuth 2.0 for Server to Server Applications" in the Google  Developers documentation:
https://developers.google.com/identity/protocols/OAuth2ServiceAccount.
You provide this information to the driver in the connection URL.


To configure service account authentication:
1. Set the OAuthType property to 0.
2. Set the ProjectID property to the name of your bigquery project.
3. Set the OAuthServiceAcctEmail property to your Google service account email address.
4. Set the OAuthPvtKeyPath property to the full path to the key file that is used to authenticate the service account email address. This paramater supports both .pl2 or .json formatted keys.


For example:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;P
rojectId=MyBigQueryProject;OAuthType=0;OAuthServiceAcctEmail
==bqtest1@data-drivertesting.iam.gserviceaccount.com;OAuthPvtKeyPath=C:\SecureFil
es\ServiceKeyFile.p12;