Generate your anomaly test with Elementary AI

Let our Slack chatbot create the anomaly test you need.
First, check if your test uses a timestamp column:
# In your YAML configuration
tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# If this is configured, you have a timestamp-based test
  • Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set.
  • Metrics are calculated for the entire table in each test run
  • Detection period (default: 2 days) determines how many buckets are being tested
Check metric collection across test runs:
-- Check metrics from different test runs
SELECT
updated_at,
metric_value
FROM your_schema.data_monitoring_metrics
WHERE full_table_name = 'your_table'
ORDER BY updated_at DESC;
  • Should see one metric per test run and per dimension
  • Training requires multiple test runs over time
  • Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one.
  • The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME
Common collection issues:
  • Test hasn’t run enough times
  • Previous test runs failed
  • Metrics not being saved between runs
Anomaly detection is influenced by:
  • Detection period (default: 2 days) - the time window being tested
  • Sensitivity (default: 3.0) - how many standard deviations from normal before flagging
  • Training data from previous periods/runs
  • metrics_anomaly_score calculates the anomaly based on the data in data_monitoring metrics.
Check calculations in metrics_anomaly_score:
-- Check how anomalies are being calculated
SELECT
    metric_name,
    latest_metric_value,
    training_avg,
    training_stddev,
    anomaly_score,
    is_anomaly
FROM your_schema.metrics_anomaly_score
WHERE full_table_name = 'your_table'
ORDER BY bucket_end DESC;
  • anomaly_score: The standardized score that measures how many standard deviations a data point is from the mean
  • is_anomaly: A boolean field that indicates whether the anomaly score exceeds the configured threshold
This occurs when there are fewer than 7 training data points. To resolve:

For timestamp-based tests:

  • Check if your timestamp column has enough historical data
  • Verify time buckets are being created correctly in data_monitoring_metrics
  • Look for gaps in your data that might affect bucket creation

For non-timestamp tests:

  • Run your tests multiple times to build up training data.
  • Check data_monitoring_metrics to verify the data collection. The test will need data for at least 7 time buckets (e.g 7 days) to calculate the anomaly.
If your test isn’t appearing in data_monitoring_metrics:Verify test configuration:
tests:
  - elementary.volume_anomalies:
      timestamp_column: created_at# Check if specified correctly

Common causes:

  • Incorrect timestamp column name
  • Timestamp column contains null values or is not of type timestamp or date
  • For non-timestamp tests: Test hasn’t run successfully
  • Incorrect test syntax
If you change it after executing elementary tests, you will need to run a full refresh to the metrics collected. This will make the next tests collect data for the new training_period timeframe. The steps are:
  1. Change var training_period in your dbt_project.yml.
  2. Full refresh of the model ‘data_monitoring_metrics’ by running dbt run --select data_monitoring_metrics --full-refresh.
  3. Running the elementary tests again.
If you want the Elementary UI to show data for a longer period of time, use the days-back option of the CLI: edr report --days-back 45