Guides
Add advanced schema tests

After you install the dbt package you can use Elementary advanced schema tests to validate data. These tests are configured and executed like any other tests in your dbt project.

Available advanced schema tests

Table (model / source) tests

Schema changes

elementary.schema_changes

Executes schema changes monitor that alerts on deleted table, deleted or added columns, or change of data type of a column. Data Types note: Snowflake, Redshift and Postgres all has “synonym” data types, like VARCHAR and TEXT in Snowflake. These will be considered as equals by the tests. Types are converted internally to deal with this.

Not supported in Databricks.

version: 2

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.schema_changes

Schema changes from baseline

elementary.schema_changes_from_baseline

Checks for schema changes against baseline columns defined in a source’s or model’s configuration. For this test to work, the configuration should contain columns and data types.

The initial configuration needed for this test can be auto-generated (see details below).

Supported parameters for the test:

  • fail_on_added - If set, the test will fail if there are columns in the table that do not exist in the baseline (default: False - meaning added columns won’t cause the test to fail).
  • enforce_types - If set, the test will raise an error if there are columns that are defined without a data type (default: False - in this case the test will not fail, but instead will only verify that the column exists and not its type)

Not supported in Databricks.

version: 2

sources:
  - name: < source name >
    database: < database name >
    schema: < schema name >
    tables:
      - name: < table name >
        columns:
          - name: < column 1 >
            data_type: < data type 1 >
          - name: < column 2 >
            data_type: < data type 2 >
        tests:
          - elementary.schema_changes_from_baseline

Column test

JSON schema

elementary.json_schema

Allows validating that a string column matches a given JSON schema. The test expects a JSON schema as input, defined according to the JSON schema standard, defined in YAML format (see an example below).

This test along with the relevant JSON schema can be auto-generated.

Note: This test relies on our Python tests capability, and is currently only supported for Snowflake and BigQuery data warehouses.

version: 2

models:
  - name: < model name >
    columns:
      - name: < column name >
        tests:
          - elementary.json_schema: <json schema parameters>

Auto-generate schema tests

To make it easier to configure schema tests, Elementary provides dbt operations to auto generate tests configuration based on the existing schemas.

Auto-generate baseline schema tests

In order for the schema changes from baseline test to work, a baseline needs to be generated from an initial state of the table and it should be added to the configuration of the source / model under “columns”. The baseline consists of the name and data type for each column.

In order to generate the baseline, Elementary provides the generate_schema_baseline_test macro. By default, running it will generate a schema_changes_from_baseline test for all sources, but it can be customized with the following arguments:

  • name - run on a specific source / model
  • include_models - whether or not to generate tests for models (default - false)
  • include_sources - whether or not to generate tests for sources (default - true)
  • fail_on_added - if set, the “fail_on_added” parameter will be added to the configuration of the tests with the supplied setting
  • enforce_types - if set, the “enforce_types” parameter will be added to the configuration of the tests with the supplied setting

Examples:

# Generate a schema changes from baseline test for all sources
dbt run-operation elementary.generate_schema_baseline_test

# Generate a schema changes from baseline test for a specific model / source named "orders"
dbt run-operation elementary.generate_schema_baseline_test --args '{"name": "orders"}'

# Generate a schema changes from baseline test for all sources and all models
dbt run-operation elementary.generate_schema_baseline_test --args '{"include_models": true}'

# Generate a schema changes from baseline test with the "fail_on_added" and "enforce_types" parameters set to true
dbt run-operation elementary.generate_schema_baseline_test --args '{"fail_on_added": true, "enforce_types": true}'

Auto-generate JSON schema tests

Elementary provides the generate_json_schema_test macro in order to auto-generate the JSON schema for a given column using existing data.

Example usage:

dbt run-operation elementary.generate_json_schema_test --args '{"node_name": "login_events", "column_name": "raw_event_data"}'

Will print:

Please add the following test to your model configuration:
----------------------------------------------------------

columns:
  - name: raw_event_data
    tests:
      - elementary.json_schema:
          type: object
          properties:
            event_id:
              type: integer
            event_name:
              type: string
            event_args:
              type: array
              items:
                type: string
          required:
            - event_id
            - event_name

Note: The generate_json_schema_test macro relies on a 3rd-party python library called genson. If you are using BigQuery, you will need to pre-install this library in your Dataproc cluster (See dbt’s documentation on Python models for more details)