BigQuery Vertex AI
Learn how to configure BigQuery to use Vertex AI models for unstructured data validation tests
BigQuery Setup for Unstructured Data Tests
Elementary’s unstructured data validation tests leverage BigQuery ML and Vertex AI models to perform advanced AI-powered validations. This guide will walk you through the setup process.
Prerequisites
Before you begin, ensure you have:
- A Google Cloud account with appropriate permissions
- Access to BigQuery and Vertex AI services
- A BigQuery dataset where you’ll create your model, that will be used by Elementary’s data validation tests. This is the dataset where you have unstructured data stored and that you want to apply validations on.
Step 1: Enable the Vertex AI API
- Navigate to the Google Cloud Console
- Go to APIs & Services > API Library
- Search for “Vertex AI API”
- Click on the API and select Enable
Step 2: Create a Remote Connection to Vertex AI
Elementary’s unstructured data validation tests use BigQuery ML to access pre-trained Vertex AI models. To establish this connection:
- Navigate to the Google Cloud Console > BigQuery
- In the Explorer panel, click the + button
- Select Connections to external data sources
- Change the connection type to Vertex AI remote models, remote functions and BigLake (Cloud Resource)
- Select the appropriate region:
- If your model and dataset are in the same region, select that specific region
- Otherwise, select multi-region
After creating the connection:
- In the BigQuery Explorer, navigate to External Connections
- Find and click on your newly created connection
- Copy the Service Account ID for the next step
Step 3: Grant Vertex AI Access Permissions
Now you need to give the connection’s service account permission to access Vertex AI:
- In the Google Cloud Console, go to IAM & Admin
- Click + Grant Access
- Under “New principals”, paste the service account ID you copied
- Assign the Vertex AI User role
- Click Save
Step 4: Create an LLM Model Interface in BigQuery
- In the BigQuery Explorer, navigate to External Connections
- Find again your newly created connection from previous step and clikc on it
- Copy the Connection ID (format:
projects/<project-name>/locations/<region>/connections/<connection-name>
) - Select a model endpoint. You can use
gemini-1.5-pro-002
as a default endpoint. - Run the following SQL query to create a model in your dataset:
Example
Note: During development, we used
gemini-1.5-pro
and recommend it as the default model for unstructured data tests in BigQuery.
Additional Resources
Step 5: Running an Unstructured Data Test
Once your model is set up, you can reference it in your Elementary tests: