Connect Google BigQuery to RapidMiner using the Simba JDBC driver
Step 1 ) Download Driver
First, we need to download the Simba JDBC driver (NOT the ODBC) from
here: http://www.simba.com/drivers/bigquery-odbc-jdbc/
Downloading and unzipping the SimbaJDBCDriverforGoogleBigQuery42_1.1.0.1000.zip file will give you the following files:
Unzip it into a folder for good housekeeping.
Step 2) Place Drivers in RapidMiner JDBC drivers folder
Navigate to: C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc and copy the folder with the above files there. It should look like this:
At this point, we need to shut down RapidMiner Studio.
Then we need to navigate to the .RapidMiner folder which is under C:\Users\<your name>\.RapidMiner and locate the jdbc_properties.xml file.
Make a backup copy of the jdbc_properties.xml file and then open it in a text editor.
Unless you have other drivers installed, it will either be empty or look like this:
<?xml version="1.0" encoding="UTF-8"?>
<drivers>
PASTE HERE
</drivers>
Then you can copy the definitions below and paste them replacing “PASTE HERE”. Make sure the file path matches your actual .RapidMiner…lib path where the .jar files are saved and you will have something looking very much like this:
<?xml version="1.0" encoding="UTF-8"?>
<drivers>
<driver urlprefix="jdbc:bigquery://" name="BigQuerySIMBA" drivers="com.simba.googlebigquery.jdbc42.Driver" driver_jar="C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\GoogleBigQueryJDBC42.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-services-bigquery-v2-rev320-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-jackson2-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-oauth-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\jackson-core-2.1.3.jar" defaultport="443" dbnameseparator=""/>
</drivers>
There is wrapping in the format here but if you open the jdbc_properties.xml file with notepad, it should all be in a single line. There is no need for any special arrangement of the text, so, it may be better to keep text wrapping to a minimum to avoid problems.
Step 3)Configure New JDBC Driver connection type
It is now time to start RapidMiner Studio and perform the following actions:
- Open Connections > Manage Database Drivers > Add to add a driver
- “Name” your new connection something like BigQuerySimba
- Set the “URL Prefix” to: jdbc:bigquery:// , “Port” 443 and leave the “Schema Separator” blank.
- The “Driver Class” should be: com.simba.googlebigquery.jdbc42.Driver
- And the “Jar file” field should contain the following list in one line:
C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\GoogleBigQueryJDBC42.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-api-services-bigquery-v2-rev320-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-http-client-jackson2-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\google-oauth-client-1.22.0.jar,C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\SimbaJDBCDriverforGoogleBigQuery42_1.0.6.1008\jackson-core-2.1.3.jar
You may recognise this comma separated list from above, again, make sure that your file paths are the ones you used.
Click “Save” and close.
Step 4) Create Database Connection (Please see step 5 for preferred connection technique)
Now we need to create a database connection.
Navigate to Connections > Manage Database Connections and create a new connection called something relevant; it does not need to have the same name as the driver connection set up above. Here, it is named “ConnectToBigQuerySIMBA”
Choose the BigQuerySimba from the drop-down for “Database system”
For “Host” enter this string:
https://www.googleapis.com/bigquery/v2:443;ProjectId=<your-project-name>;OAuthType=1
for more information on this, please read the Simba documentation pdf file that came with the driver: “Simba JDBC Driver for Google BigQuery Install and Configuration Guide”
“Port” and “Database scheme” remain empty.
For “User”, you will have to enter the user email from Google and the associated password and hit “Save”.
The setup is now complete for testing purposes.
When you click “Test” you will be prompted as follows for a key:
Copy the URL, paste it in your browser and navigate to it. You will be asked to approve certain permissions and then you will be given a one-time key that looks like this:
Once you copy this code, you paste it into the text box replacing the URL there as so:
And your connection should succeed:
Step 5) Using a Google Service Account (Preferred)
You can configure the driver to authenticate the connection with a Google service account. The service account can handle the authentication process so that no user input is required.
You must provide a Google service account email address and the full path to a private key file for the service account. You can download the private key file from the Google API console web page. For more information about OAuth authentication using a service account, see "Using OAuth 2.0 for Server to Server Applications" in the Google Developers documentation:
https://developers.google.com/identity/protocols/OAuth2ServiceAccount.
You provide this information to the driver in the connection URL.
To configure service account authentication:
1. Set the OAuthType property to 0.
2. Set the ProjectID property to the name of your bigquery project.
3. Set the OAuthServiceAcctEmail property to your Google service account email address.
4. Set the OAuthPvtKeyPath property to the full path to the key file that is used to authenticate the service account email address. This paramater supports both .pl2 or .json formatted keys.
For example:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;P
rojectId=MyBigQueryProject;OAuthType=0;OAuthServiceAcctEmail
==bqtest1@data-drivertesting.iam.gserviceaccount.com;OAuthPvtKeyPath=C:\SecureFil
es\ServiceKeyFile.p12;