The Google BigQuery connector allows users to perform different operations on the Google BigQuery instance from Anypoint Platform/Anypoint Studio. BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. The connector exposes Google BigQuery operations by executing their API calls as per configuration. It supports various operations related to Dataset, Table, Job, etc.
Connector Namespace and Schema
While designing Mule application in Anypoint Studio, when users drag the connector from the palette onto the Anypoint Studio canvas, studio automatically populates the XML code with the connector namespace and schema location.
Schema Location: http://www.mulesoft.org/schema/mule/bigquery/current/mule-bigquery.xsd
Maven Dependency Information
After downloading and installing the connector, following steps make the Google BigQuery connector available inside a Mule application for use and to package the application with connector. If using Anypoint Studio, it will do this automatically. For Maven dependency management, include this XML snippet in pom.xml file in the Mule project.
Creating a New Project
To use the BigQuery connector in a Mule application project:
1.In Anypoint Studio, click File > New > Mule Project
2.Select Mule project from the dialog box.
3.Enter project name and specify Runtime, API Implementation and Project Location if needed.
4.Click Finish to create the project
Create a Google Service Account
There are a couple ways to set up authentication with Google for the driver to work. For this article, we will configure the driver to authenticate the connection with a Google service account. The service account can handle the authentication process; so that no user input is required.
In your Google Cloud account, navigate to Service accounts under IAM & admin.
Click on Create Service Account at the top.
Fill in all the fields to describe the service account:
Write down the Service account ID that is auto generated. We will need this later when we set up the configuration file. Next, click Create and then click Continue on the next screen for Step 2.
At the bottom of the screen in Step 3, click Create Key
Select any key type you would like. For this article, I selected *.p12. Click Create and it will download the file to your machine. Write down the password if you would like; we will not use it for this article but if you intend to use this key for other projects, you will need it. Click Close and then click Done.
Next, find the *.p12 key file that was downloaded and copy and paste that into the src/main/resources folder of your project back in Anypoint Studio.
Now, we have this information, we can set up the configuration properties file for the Mule project.
Setup Configuration Properties
Back in Studio, right click on the src/main/resources folder and create a new file.
Give the file a name (e.g. mule-application.properties) and then click Finish.
To access BigQuery instance user needs to configure service account key as mentioned below.
SERVICE ACCOUNT KEY
Following parameters are required for SERVICE ACCOUNT KEY configuration:
|Project ID||Project ID of Google Cloud Platform’s project (If any) or create a new project on Google Cloud Platform.|
|Service Account Key (JSON) File||Associated with Google Cloud Platform’s Project. Download the file, place it in BASE_LOCATION_OF_PROJECT/src/main/resources folder and provide its relative path.|
Insert Data to BigQuery Table and Extract it to Google Cloud Storage
This use case demonstrates Inserting data into BigQuery Table and extract it to Google Cloud Storage. It creates dataset and table, Insert data into it and extract data to Google Cloud Storage. It also gets the details of the extract job and check job execution status. At last, it deletes dataset and table once extract job completes its execution.
- In Anypoint Studio, click File > New > Mule Project, name the project, and click Finish.
- In the search field, type Httpand drag the HTTP Listener to the canvas.
- Click the HTTP Listener, click the green plussign at the right of Connector Configuration. On the next screen, add the host and port, click OK.
- Provide Pathto HTTP Listener (Example: /extractData)
- On the Mule Palette, search for Create Datasetand drag it onto the canvas.
- Provide appropriate dataset name and other fields.
- Add Set Payloadonto the canvas. Set below JSON for table fields/schema as expression to the payload value.
- We used following Table Schema:
|Field name||Field type|
- Add Create Table onto the canvas. Provide Dataset Name, Table Name and map payload to the Table Fields using below Data Weave script.
- Add Set Payload to the canvas and set below JSON as value. This JSON contains value for fields in Table Schema and RowId to specify row index of the data.
- Add Transform Message to the canvas and add below DataWeave script. It transforms JSON Payload to Row Data in Insert All operation. The output of transform message is used to insert data into table.
- Search InsertAll, drag it onto the canvas. Provide Dataset Name , Table Name.
- Set payload of Transform Message as expression to Rows data or set above DataWeave script into Rows data as expression.
- Drag the Logger onto the canvas and log payload as expression to log the output of the operation.
- Drag Flow Reference and provide flow name of the next sub flow
- Now, create a new sub flow Drag Create Job onto the canvas and select Extract Job as job config.
- Provide Destination URIs(URI of Google Cloud Storage where you want to extract Google BigQuery’s table data).
- Provide Source Table and Dataset
- Note: For Extract Job operation, only simple types may be exported as CSV.
- Add Set Variable and set variable name as JobId and below value as expression
- This JobId variable can be used by other operations to get the details of the Job. In this use case, it is used by Get Job
- Drag Flow Reference and provide flow name of next sub flow
- Create a new sub flow , search for Get Job, drag it onto the canvas, and provide jobName as expression
- Drag Set Variable and set variable name as jobStatus and below value as expression
- This jobStatus variable can be used to check the status of the Job.
- Add Choice from Mule Palette to the canvas. It is used to switch between different choices depending on provided expression.
- Add below condition as expression in Choice to check if the job has completed its execution.
- To the expression/when side, add Logger to log jobStatus and add FLow Reference to call another sub flow deleteFlow to delete the resources.
- To the default side of the Choice, again add the Logger to log jobStatus as expression
- As Job is still Running, add Flow Referenceto call the same flow
- Create new sub flow deleteFlowto delete the resources.
- Drag Delete Table and provide table, dataset name.
- Drag Delete Dataset and provide dataset name.
- After you create the flow, right-click the project and click Run As > Mule Application
- Once the application is deployed, use HTTP Listener’s Path to execute the Mule Flow.