Add advanced schema tests
After you install the dbt package you can use Elementary advanced schema tests to validate data. These tests are configured and executed like any other tests in your dbt project.
Available advanced schema tests
Table (model / source) tests
Schema changes
elementary.schema_changes
Executes schema changes monitor that alerts on deleted table, deleted or added columns, or change of data type of a column.
Data Types note: Snowflake, Redshift and Postgres all has “synonym” data types, like VARCHAR
and TEXT
in Snowflake. These will be considered as equals by the tests. Types are converted internally to deal with this.
Not supported in Databricks.
version: 2
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.schema_changes
Schema changes from baseline
elementary.schema_changes_from_baseline
Checks for schema changes against baseline columns defined in a source’s or model’s configuration. For this test to work, the configuration should contain columns and data types.
The initial configuration needed for this test can be auto-generated (see details below).
Supported parameters for the test:
fail_on_added
- If set, the test will fail if there are columns in the table that do not exist in the baseline (default: False - meaning added columns won’t cause the test to fail).enforce_types
- If set, the test will raise an error if there are columns that are defined without a data type (default: False - in this case the test will not fail, but instead will only verify that the column exists and not its type)
Not supported in Databricks.
version: 2
sources:
- name: < source name >
database: < database name >
schema: < schema name >
tables:
- name: < table name >
columns:
- name: < column 1 >
data_type: < data type 1 >
- name: < column 2 >
data_type: < data type 2 >
tests:
- elementary.schema_changes_from_baseline
Column test
JSON schema
elementary.json_schema
Allows validating that a string column matches a given JSON schema. The test expects a JSON schema as input, defined according to the JSON schema standard, defined in YAML format (see an example below).
This test along with the relevant JSON schema can be auto-generated.
Note: This test relies on our Python tests capability, and is currently only supported for Snowflake and BigQuery data warehouses.
version: 2
models:
- name: < model name >
columns:
- name: < column name >
tests:
- elementary.json_schema: <json schema parameters>
Auto-generate schema tests
To make it easier to configure schema tests, Elementary provides dbt operations to auto generate tests configuration based on the existing schemas.
Auto-generate baseline schema tests
In order for the schema changes from baseline test to work, a baseline needs to be generated from an initial state of the table and it should be added to the configuration of the source / model under “columns”. The baseline consists of the name and data type for each column.
In order to generate the baseline, Elementary provides the generate_schema_baseline_test
macro. By default,
running it will generate a schema_changes_from_baseline test for all sources, but it can be customized with
the following arguments:
name
- run on a specific source / modelinclude_models
- whether or not to generate tests for models (default - false)include_sources
- whether or not to generate tests for sources (default - true)fail_on_added
- if set, the “fail_on_added” parameter will be added to the configuration of the tests with the supplied settingenforce_types
- if set, the “enforce_types” parameter will be added to the configuration of the tests with the supplied setting
Examples:
# Generate a schema changes from baseline test for all sources
dbt run-operation elementary.generate_schema_baseline_test
# Generate a schema changes from baseline test for a specific model / source named "orders"
dbt run-operation elementary.generate_schema_baseline_test --args '{"name": "orders"}'
# Generate a schema changes from baseline test for all sources and all models
dbt run-operation elementary.generate_schema_baseline_test --args '{"include_models": true}'
# Generate a schema changes from baseline test with the "fail_on_added" and "enforce_types" parameters set to true
dbt run-operation elementary.generate_schema_baseline_test --args '{"fail_on_added": true, "enforce_types": true}'
Auto-generate JSON schema tests
Elementary provides the generate_json_schema_test
macro in order to auto-generate the JSON schema for a given
column using existing data.
Example usage:
dbt run-operation elementary.generate_json_schema_test --args '{"node_name": "login_events", "column_name": "raw_event_data"}'
Will print:
Please add the following test to your model configuration:
----------------------------------------------------------
columns:
- name: raw_event_data
tests:
- elementary.json_schema:
type: object
properties:
event_id:
type: integer
event_name:
type: string
event_args:
type: array
items:
type: string
required:
- event_id
- event_name
Note: The generate_json_schema_test
macro relies on a 3rd-party python library called genson
. If you are using
BigQuery, you will need to pre-install this library in your Dataproc cluster (See dbt’s documentation on Python models
for more details)