1. Guides
  2. Add anomaly detection tests

After you install the dbt package, you can add Elementary data anomaly detection tests.

Data anomaly detection dbt tests

Elementary dbt package includes data monitoring and anomaly detection as dbt tests. The tests collect data quality metrics. On each execution, the latest metrics are compared to historical values to detect anomalies. These tests are configured and executed like any other tests in your project.

Anomaly detection tests are not yet supported on Databricks in dbt Cloud.

Available anomaly detection tests

Table (model / source) tests

Column tests

Advanced configuration for your elementary tests

The elementary anomaly detection tests described above can work out-of-the-box with default configuration. However, we support additional configuration that can be used to customize their behavior, depending on your needs.

Elementary tests have three levels of configurations:

  1. Test arguments - Test specific arguments.
  2. Table configuration - Configure the timestamp column and details of a monitored table.
  3. Global vars - Optional configuration parameters of the operation.

More on data tests configuration here.

We recommend adding a tag to the tests so you could execute these in a dedicated run using the selection parameter --select tag:elementary. If you wish to only be warned on anomalies, configure the severity of the tests to warn.

version: 2

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.table_anomalies:
          table_anomalies: < specific monitors, all if null >
          # optional - configure different freshness column than timestamp column
          freshness_column: < freshness_column >
          where_expression: < sql expression >
          time_bucket:
            period: < time period >
            count: < number of periods >
      - elementary.all_columns_anomalies:
          column_anomalies: < specific monitors, all if null >
          where_expression: < sql expression >
          time_bucket:
            period: < time period >
            count: < number of periods >
      - elementary.schema_changes
      - elementary.dimension_anomalies:
          dimensions: < columns or sql expressions of columns >
          # optional - configure a where a expression to accurate the dimension monitoring
          where_expression: < sql expression >
          time_bucket:
            period: < time period >
            count: < number of periods >

  - name: < model name >
    ## if no timestamp is configured, elementary will monitor without time filtering
    columns:
      - name: < column name >
        tests:
          - elementary.column_anomalies:
              column_anomalies: < specific monitors, all if null >

What happens on each test?

Upon running a test, your data is split into time buckets based on the time_bucket field and is limited by the days_back var. The test then compares a certain metric (e.g. row count) of the buckets that are within the detection period (backfill_days) to the row count of all the previous time buckets. If there were any anomalies in the detection period, the test will fail. On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical metrics. At the end of the dbt test run, all results and collected metrics are merged into the elementary models.

What does it mean when a test fails?

When a test fail, it means that an anomaly was detected on this metric and dataset. To learn more, refer to anomaly detection.