1. Data Anomaly Detection Tests
  2. Data anomaly detection

Elementary dbt package includes data monitoring and anomaly detection as dbt tests. The tests collect data quality metrics. On each execution, the latest metrics are compared to historical values to detect anomalies. These tests are configured and executed like any other tests in your project.

Tests and monitors types

What are data monitors?

Data monitors are SQL queries generators that are executed to collect a specific metric of the data, and track it over time.

How do monitors work in Elementary data tests?

Monitors have two modes:

Time buckets

If a timestamp_column is defined for the table, the monitor will collect metrics by timeframe buckets. It is highly recommended to use time buckets on every table that has a time field. This is both for performance reasons, as well as better anomaly detection.

The default time bucket is 24 hours.

Global

If there is no timestamp column configured, monitors will query on the entire table, in intervals that are at least the duration of the timeframe bucket.

Available monitors

Table level monitors

Monitor name
freshness
row_count

Column level monitors

PropertyColumn Type
null_countany
null_percentany
min_lengthstring
max_lengthstring
average_lengthstring
missing_countstring
missing_percentstring
minnumeric
maxnumeric
zero_countnumeric
zero_percentnumeric
standard_deviationnumeric
variancenumeric
sumnumeric

Dimension monitors

Dimension monitors the frequency of field values (row count for groups based on given columns/expressions).

Anomaly detection

Elementary uses ”standard score”, also known as “Z-score” for anomaly detection. This score represents the number of standard deviations of a value from the average of a set of values.

According to the empirical rule, in a standard normal distribution:

  • ~68% of values have an absolute z-score of 1 or less.
  • ~95% of values have an absolute z-score of 2 or less.
  • ~99.7% of values have an absolute z-score of 3 or less.

Values with a standard score of 3 and above are considered outliers, and this is a recommended threshold for anomaly detection. This is the default Elementary uses as well, and it can be changed using the var anomaly_score_threshold in the global configuration.

You can use the model anomaly_sensitivity to see if values of metrics from your last run would have been considered anomalies in different scores. This can help you decide if there is a need to adjust the sensitivity:

Z Score