Generate your anomaly test with Elementary AI
Let our Slack chatbot create the anomaly test you need.
Verify the data collection for your anomaly test
Verify the data collection for your anomaly test
First, check if your test uses a timestamp column:
If you have a timestamp-based test (recommended)
If you have a timestamp-based test (recommended)
- Metrics are calculated by grouping data into time buckets (default: ‘day’)
- Detection period (default: 2 days) determines how many buckets are being tested
- Training period data (default: 14 days) comes from historical buckets, allowing immediate anomaly detection with sufficient history
- Each bucket should represent one time bucket (e.g., daily metrics)
- Gaps in
metric_timestamp
might indicate data collection issues - Training uses historical buckets for anomaly detection
- The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME
- Missing or null values in timestamp column
- Timestamp column not in expected format
- No data in specified training period
If you don't have a timestamp configured
If you don't have a timestamp configured
- Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set.
- Metrics are calculated for the entire table in each test run
- Detection period (default: 2 days) determines how many buckets are being tested
- Should see one metric per test run and per dimension
- Training requires multiple test runs over time
- Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one.
- The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME
- Test hasn’t run enough times
- Previous test runs failed
- Metrics not being saved between runs
Verify anomaly calculations
Verify anomaly calculations
Anomaly detection is influenced by:
- Detection period (default: 2 days) - the time window being tested
- Sensitivity (default: 3.0) - how many standard deviations from normal before flagging
- Training data from previous periods/runs
metrics_anomaly_score
calculates the anomaly based on the data indata_monitoring metrics
.
metrics_anomaly_score
:anomaly_score
: The standardized score that measures how many standard deviations a data point is from the meanis_anomaly
: A boolean field that indicates whether the anomaly score exceeds the configured threshold
'Not enough data to calculate anomaly' error
'Not enough data to calculate anomaly' error
This occurs when there are fewer than 7 training data points. To resolve:
For timestamp-based tests:
- Check if your timestamp column has enough historical data
- Verify time buckets are being created correctly in
data_monitoring_metrics
- Look for gaps in your data that might affect bucket creation
For non-timestamp tests:
- Run your tests multiple times to build up training data.
- Check
data_monitoring_metrics
to verify the data collection. The test will need data for at least 7 time buckets (e.g 7 days) to calculate the anomaly.
Missing data in data_monitoring_metrics
Missing data in data_monitoring_metrics
If your test isn’t appearing in
data_monitoring_metrics
:Verify test configuration:Common causes:
- Incorrect timestamp column name
- Timestamp column contains null values or is not of type timestamp or date
- For non-timestamp tests: Test hasn’t run successfully
- Incorrect test syntax
Training period changed, but results are the same
Training period changed, but results are the same
If you change it after executing elementary tests, you will need to run a full refresh to the metrics collected. This will make the next tests collect data for the new
training_period
timeframe. The steps are:- Change var
training_period
in yourdbt_project.yml
. - Full refresh of the model ‘data_monitoring_metrics’ by running
dbt run --select data_monitoring_metrics --full-refresh
. - Running the elementary tests again.
edr report --days-back 45