version: 2

models:
  - name: < model name >
    columns:
      - name: < column name >
        tests:
          - elementary.ai_data_validation:
              expectation_prompt: "Description of what the data should satisfy"
              llm_model_name: "model_name"  # Optional

Beta Feature: AI data validation tests is currently in beta. The functionality and interface may change in future releases.

Version Requirement: This feature requires Elementary dbt package version 0.18.0 or above.

AI Data Validation with Elementary

What is AI Data Validation?

Elementary’s elementary.ai_data_validation test allows you to validate any data column using AI and LLM language models. This test is more flexible than traditional tests as it can be applied to any column type and uses natural language to define validation rules.

With ai_data_validation, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations. This is particularly useful for complex validation rules that would be difficult to express with traditional SQL or dbt tests.

How It Works

Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test:

  1. Your data stays within your data warehouse
  2. The warehouse’s built-in AI and LLM functions analyze the data
  3. Elementary reports whether each value meets your expectations based on the prompt

Required Setup for Each Data Warehouse

Before you can use Elementary’s AI data validations, you need to set up AI and LLM capabilities in your data warehouse:

Snowflake

Databricks

BigQuery

Redshift

  • Support coming soon

Data Lakes

Using the AI Data Validation Test

The test requires one main parameter:

  • expectation_prompt: Describe what you expect from the data in plain English

Optionally, you can also specify:

  • llm_model_name: Specify which AI model to use (see recommendations above for each warehouse)

This test works with any column type, as the data will be converted to a string format for validation. This enables natural language data validations for dates, numbers, and other structured data types.

version: 2

models:
  - name: < model name >
    columns:
      - name: < column name >
        tests:
          - elementary.ai_data_validation:
              expectation_prompt: "Description of what the data should satisfy"
              llm_model_name: "model_name"  # Optional