Anomaly Tests Troubleshooting

Verify the data collection for your anomaly test

First, check if your test uses a timestamp column:

# In your YAML configuration
tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# If this is configured, you have a timestamp-based test

If you have a timestamp-based test (recommended)

Metrics are calculated by grouping data into time buckets (default: ‘day’)
Detection period (default: 2 days) determines how many buckets are being tested
Training period data (default: 14 days) comes from historical buckets, allowing immediate anomaly detection with sufficient history

Verify data collection:

-- Check if metrics are being collected in time buckets
SELECT
bucket_end,
metric_value,
COUNT(*) as metrics_per_bucket
FROM your_schema.data_monitoring_metrics
WHERE full_table_name = 'your_table'
GROUP BY bucket_end, metric_value
ORDER BY bucket_end DESC;

Each bucket should represent one time bucket (e.g., daily metrics)
Gaps in metric_timestamp might indicate data collection issues
Training uses historical buckets for anomaly detection
The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME

Common collection issues:

Missing or null values in timestamp column
Timestamp column not in expected format
No data in specified training period

If you don't have a timestamp configured

Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set.
Metrics are calculated for the entire table in each test run
Detection period (default: 2 days) determines how many buckets are being tested

Check metric collection across test runs:

-- Check metrics from different test runs
SELECT
updated_at,
metric_value
FROM your_schema.data_monitoring_metrics
WHERE full_table_name = 'your_table'
ORDER BY updated_at DESC;

Should see one metric per test run and per dimension
Training requires multiple test runs over time
Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one.
The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME

Common collection issues:

Test hasn’t run enough times
Previous test runs failed
Metrics not being saved between runs

Verify anomaly calculations

Anomaly detection is influenced by:

Detection period (default: 2 days) - the time window being tested
Sensitivity (default: 3.0) - how many standard deviations from normal before flagging
Training data from previous periods/runs
metrics_anomaly_score calculates the anomaly based on the data in data_monitoring metrics.

Check calculations in metrics_anomaly_score:

-- Check how anomalies are being calculated
SELECT
    metric_name,
    latest_metric_value,
    training_avg,
    training_stddev,
    anomaly_score,
    is_anomaly
FROM your_schema.metrics_anomaly_score
WHERE full_table_name = 'your_table'
ORDER BY bucket_end DESC;

anomaly_score: The standardized score that measures how many standard deviations a data point is from the mean
is_anomaly: A boolean field that indicates whether the anomaly score exceeds the configured threshold

'Not enough data to calculate anomaly' error

Missing data in data_monitoring_metrics

If your test isn’t appearing in data_monitoring_metrics:Verify test configuration:

tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# Check if specified correctly

Common causes:

Incorrect timestamp column name
Timestamp column contains null values or is not of type timestamp or date
For non-timestamp tests: Test hasn’t run successfully
Incorrect test syntax

Training period changed, but results are the same

Elementary dbt package

Elementary Data tests

Anomaly Detection Tests

Schema Tests

AI Data Tests (Beta)

Other Tests

Anomaly Tests Troubleshooting

Generate your anomaly test with Elementary AI

For timestamp-based tests:

For non-timestamp tests:

Common causes: