Add anomaly detection tests
After you install the dbt package, you can add Elementary data anomaly detection tests.
Data anomaly detection dbt tests
Elementary dbt package includes anomaly detection tests, implemented as dbt tests. These tests can detect anomalies in volume, freshness, null rates, and anomalies in specific dimensions, among others. The tests are configured and executed like any other tests in your project.Anomaly detection test result example from Elementary report
Available anomaly detection tests
Table (model / source) tests
Monitors the row count of your table over time per time bucket (if configured without
timestamp_column, will count table total rows).
Monitors the freshness of your table over time, as the expected time between data updates.
Event freshness anomalies
Monitors the freshness of event data over time, as the expected time it takes each event to load -
that is, the time between when the event actually occurs (the
event timestamp), and when it is loaded to the
update timestamp). Configuring
event_timestamp_column is required, and
update_timestamp_column is optional.
This test monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution.
It is best to configure it on low-cardinality fields.
The test counts rows grouped by given
All columns anomalies
Executes column level monitors and anomaly detection on all the columns of the table.
Specific monitors are detailed here.
You can use
column_anomalies param to override the default monitors, and
exclude_regexp to exclude columns from the test.
Executes column level monitors and anomaly detection on the column.
Specific monitors are detailed here and can be configured using
Adding tests examples:
version: 2 models: - name: < model name > config: elementary: timestamp_column: < timestamp column > tests: - elementary.freshness_anomalies: # optional - configure different freshness column than timestamp column where_expression: < sql expression > time_bucket: period: < time period > count: < number of periods > - elementary.all_columns_anomalies: column_anomalies: < specific monitors, all if null > where_expression: < sql expression > time_bucket: period: < time period > count: < number of periods > - elementary.schema_changes - elementary.dimension_anomalies: dimensions: < columns or sql expressions of columns > # optional - configure a where a expression to accurate the dimension monitoring where_expression: < sql expression > time_bucket: period: < time period > count: < number of periods > - name: < model name > ## if no timestamp is configured, elementary will monitor without time filtering columns: - name: < column name > tests: - elementary.column_anomalies: column_anomalies: < specific monitors, all if null >
Configure your elementary anomaly detection tests
If your data set has a timestamp column that represents the creation time of a
field, it is highly recommended configuring it as a
To support different types of data sets, the tests have configuration that can be used to customize their behavior. Read more about data anomaly detection tests configuration here.
We recommend adding a tag to the tests so you could execute these in a dedicated run using the selection parameter
If you wish to only be warned on anomalies, configure the
severity of the tests to
What happens on each test?
Upon running a test, your data is split into time buckets based on the
time_bucket field and is limited by
days_back var. The test then compares a certain metric (e.g. row count) of the buckets that are within the detection
backfill_days) to the row count of all the previous time buckets within the
If there were any anomalies in the detection period, the test will fail.
On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical metrics.
To learn more, refer to core concepts.
What does it mean when a test fails?
When a test fail, it means that an anomaly was detected on this metric and dataset. To learn more, refer to core concepts and anomaly detection.