AI Data Validations
Beta Feature: AI data validation tests is currently in beta. The functionality and interface may change in future releases.
Version Requirement: This feature requires Elementary dbt package version 0.18.0 or above.
AI Data Validation with Elementary
What is AI Data Validation?
Elementary’s elementary.ai_data_validation
test allows you to validate any data column using AI and LLM language models. This test is more flexible than traditional tests as it can be applied to any column type and uses natural language to define validation rules.
With ai_data_validation
, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations. This is particularly useful for complex validation rules that would be difficult to express with traditional SQL or dbt tests.
How It Works
Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test:
- Your data stays within your data warehouse
- The warehouse’s built-in AI and LLM functions analyze the data
- Elementary reports whether each value meets your expectations based on the prompt
Required Setup for Each Data Warehouse
Before you can use Elementary’s AI data validations, you need to set up AI and LLM capabilities in your data warehouse:
Snowflake
- Prerequisite: Enable Snowflake Cortex AI LLM functions
- Recommended Model:
claude-3-5-sonnet
- View Snowflake’s Guide
Databricks
- Prerequisite: Ensure Databricks AI Functions are available
- Recommended Model:
databricks-meta-llama-3-3-70b-instruct
- View Databrick’s Setup Guide
BigQuery
- Prerequisite: Configure BigQuery to use Vertex AI models
- Recommended Model:
gemini-1.5-pro
- View BigQuery’s Setup Guide
Redshift
- Support coming soon
Data Lakes
- Currently supported through Snowflake, Databricks, or BigQuery external object tables
- View Data Lakes Information
Using the AI Data Validation Test
The test requires one main parameter:
expectation_prompt
: Describe what you expect from the data in plain English
Optionally, you can also specify:
llm_model_name
: Specify which AI model to use (see recommendations above for each warehouse)
This test works with any column type, as the data will be converted to a string format for validation. This enables natural language data validations for dates, numbers, and other structured data types.