- Data Anomaly Detection Tests
- Data anomaly detection
Data Anomaly Detection Tests
Data anomaly detection
Elementary dbt package includes data monitoring and anomaly detection as dbt tests. The tests collect data quality metrics. On each execution, the latest metrics are compared to historical values to detect anomalies. These tests are configured and executed like any other tests in your project.
Tests and monitors types
What are data monitors?
Data monitors are SQL queries generators that are executed to collect a specific metric of the data, and track it over time.
How do monitors work in Elementary data tests?
Monitors have two modes:
timestamp_column is defined for the table, the monitor will collect metrics by timeframe buckets. It is highly recommended to use time buckets on every table that has a time field. This is both for performance reasons, as well as better anomaly detection.
The default time bucket is 24 hours.
If there is no timestamp column configured, monitors will query on the entire table, in intervals that are at least the duration of the timeframe bucket.
Table level monitors
Column level monitors
Dimension monitors the frequency of field values (row count for groups based on given columns/expressions).
Elementary uses ”standard score”, also known as “Z-score” for anomaly detection. This score represents the number of standard deviations of a value from the average of a set of values.
According to the empirical rule, in a standard normal distribution:
- ~68% of values have an absolute z-score of 1 or less.
- ~95% of values have an absolute z-score of 2 or less.
- ~99.7% of values have an absolute z-score of 3 or less.
Values with a standard score of 3 and above are considered outliers, and this is a recommended threshold for anomaly detection.
This is the default Elementary uses as well, and it can be changed using the var
anomaly_score_threshold in the global configuration.
You can use the model
anomaly_sensitivity to see if values of metrics from your last run would have been considered anomalies in different scores. This can help you decide if there is a need to adjust the sensitivity: