(Not a dbt user? you can still use Elementary, reach out to us on Slack and we will help).
Elementary is the first solution that delivers data monitoring and anomaly detection as dbt tests. Elementary dbt tests are actually data monitors that collect metrics and metadata over time. On each execution, the tests analyze the new data, compare it to historical metrics, and alert on anomalies and outliers. These tests are configured and executed like any other tests in your project.
After you add the package to your project, you can add Elementary tests to your models and sources configuration.
When you execute
dbt run, elementary creates models and upload artifacts.
When you execute
dbt test, elementary data monitors collect metrics according to your configuration, and analyze to detect anomalies. If anomalies are detected - the test will fail/warn.
dbt run and
dbt test, execute the Elementary CLI
edr monitor command. This will aggregate the results, and alert to Slack on new anomalies.
The usage flow is as follows:
dbt runas usual (the models in this run have minimal performance impact).
dbt testas usual, but this time Elementary data monitoring and anomaly detection will run as part of your tests.
edr monitorto aggregate the results and get Slack alerts.
Data monitors are SQL queries generators that are executed to collect a specific metric of the data, and track it over time.
Monitors have two modes:
timestamp_column is defined for the table, the monitor will collect metrics by timeframe buckets. It is highly recommended to use time buckets on every table that has a time field. This is both for performance reasons, as well as better anomaly detection.
The default time bucket is 24 hours.
If there is no timestamp column configured, monitors will query on the entire table, in intervals that are at least the duration of the timeframe bucket.
Dimension monitors the frequency of field values (row count for groups based on given columns/expressions).
Elementary uses ”standard score”, also known as “Z-score” for anomaly detection. This score represents the number of standard deviations of a value from the average of a set of values.
According to the empirical rule, in a standard normal distribution:
Values with a standard score of 3 and above are considered outliers, and this is a recommended threshold for anomaly detection.
This is the default Elementary uses as well, and it can be changed using the var
anomaly_score_threshold in the global configuration.
You can use the model
anomaly_threshold_sensitivity to see if values of metrics from your last run would have been considered anomalies in different scores. This can help you decide if there is a need to adjust the sensitivity: