Elementary dbt package includes data monitoring and anomaly detection
as dbt tests. The tests collect data quality metrics. On
each execution, the latest metrics are compared to historical values to detect anomalies.
These tests are configured and executed like any other tests in your project.
Anomaly detection tests are not yet supported on Databricks in dbt Cloud.
Upon running the test, your data is split into time buckets (daily by default, configurable with the time bucket
field), and then we compute the row count per bucket for the last days_back days (by default 14).
The test then compares the row count of buckets within the detection period (last 2 days by default, controlled by the
days_back var), and compares it with the row count of the previous time buckets.
If there were any anomalies during the detection period, the test will fail.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >tests:-elementary.volume_anomalies:timestamp_column: < timestamp column >where_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >
elementary.freshness_anomalies
Monitors the freshness of your table over time, as the expected time between data updates.
Upon running the test, your data is split into time buckets (daily by default, configurable with the time bucket
field), and then we compute the maximum freshness value per bucket for the last days_back days (by default 14).
The test then compares the freshness of buckets within the detection period (last 2 days by default, controlled by the
days_back var), and compares it with the freshness of the previous time buckets.
If there were any anomalies during the detection period, the test will fail.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >tests:-elementary.freshness_anomalies:timestamp_column: < timestamp column ># Mandatorywhere_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >
elementary.event_freshness_anomalies
Monitors the freshness of event data over time, as the expected time it takes each event to load -
that is, the time between the when the event actually occurs (the event timestamp), and when it is loaded to the
database (the update timestamp).
This test compliments the freshness_anomalies test and is primarily intended for data that is updated in a
continuous / streaming fashion.
The test can work in a couple of modes:
If only an event_timestamp_column is supplied, the test measures over time the difference between the current
timestamp (“now”) and the most recent event timestamp.
If both an event_timestamp_column and an update_timestamp_column are provided, the test will measure over time
the difference between these two columns.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >tests:-elementary.event_freshness_anomalies:event_timestamp_column: < timestamp column ># Mandatoryupdate_timestamp_column: < timestamp column ># Optionalwhere_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >
elementary.dimension_anomalies
This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in
the distribution.
It is best to configure it on low-cardinality fields.
The test counts rows grouped by given columns/expressions, and can be configured using the dimensions
and where_expression keys.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >config:elementary:timestamp_column: < timestamp column >tests:-elementary.dimension_anomalies:dimensions: < columns or sql expressions of columns ># optional - configure a where a expression to accurate the dimension monitoringwhere_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >
elementary.all_columns_anomalies
Executes column level monitors and anomaly detection on all the columns of the table. Specific monitors
are detailed here and can be configured using
the all_columns_anomalies key.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >config:elementary:timestamp_column: < timestamp column >tests:-elementary.all_columns_anomalies:column_anomalies: < specific monitors, all if null >where_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >
Executes column level monitors and anomaly detection. Specific monitors
are detailed here and can be configured using
the column_anomalies key.
For advanced configuration of Elementary anomaly tests, please click here
version:2models:-name: < model name >config:elementary:timestamp_column: < timestamp column >columns:-name: < column name >tests:-elementary.column_anomalies:column_anomalies: < specific monitors, all if null >where_expression: < sql expression >time_bucket:# Daily by defaultperiod: < time period >count: < number of periods >-name: < model name >## if no timestamp is configured, elementary will monitor without time filteringcolumns:-name: < column name >tests:-elementary.column_anomalies:column_anomalies: < specific monitors, all if null >where_expression: < sql expression >
The elementary anomaly detection tests described above can work out-of-the-box with default configuration. However,
we support additional configuration that can be used to customize their behavior, depending on your needs.
Elementary tests have three levels of configurations:
We recommend adding a tag to the tests so you could execute these in a dedicated run using the selection
parameter --select tag:elementary.
If you wish to only be warned on anomalies, configure the severity of the tests to warn.
version:2models:-name: < model name >config:elementary:timestamp_column: < timestamp column >tests:-elementary.table_anomalies:table_anomalies: < specific monitors, all if null ># optional - configure different freshness column than timestamp columnfreshness_column: < freshness_column >where_expression: < sql expression >time_bucket:period: < time period >count: < number of periods >-elementary.all_columns_anomalies:column_anomalies: < specific monitors, all if null >where_expression: < sql expression >time_bucket:period: < time period >count: < number of periods >- elementary.schema_changes
-elementary.dimension_anomalies:dimensions: < columns or sql expressions of columns ># optional - configure a where a expression to accurate the dimension monitoringwhere_expression: < sql expression >time_bucket:period: < time period >count: < number of periods >-name: < model name >## if no timestamp is configured, elementary will monitor without time filteringcolumns:-name: < column name >tests:-elementary.column_anomalies:column_anomalies: < specific monitors, all if null >
Upon running a test, your data is split into time buckets based on the time_bucket field and is limited by
the days_back var. The test then compares a certain metric (e.g. row count) of the buckets that are within the detection
period (backfill_days) to the row count of all the previous time buckets.
If there were any anomalies in the detection period, the test will fail.
On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical
metrics.
At the end of the dbt test run, all results and collected metrics are merged into the elementary models.