Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.elementary-data.com/llms.txt

Use this file to discover all available pages before exploring further.

Generate your anomaly test with Elementary AI

Let our Slack chatbot create the anomaly test you need.
First, check if your test uses a timestamp column:
# In your YAML configuration
data_tests:
  - elementary.volume_anomalies:
      arguments:
        timestamp_column: created_at# If this is configured, you have a timestamp-based test
  • Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set.
  • Metrics are calculated for the entire table in each test run
  • Detection period (default: 2 days) determines how many buckets are being tested
Check metric collection across test runs:
-- Check metrics from different test runs
SELECT
updated_at,
metric_value
FROM your_schema.data_monitoring_metrics
WHERE full_table_name = 'your_table'
ORDER BY updated_at DESC;
  • Should see one metric per test run and per dimension
  • Training requires multiple test runs over time
  • Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one.
  • The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME
Common collection issues:
  • Test hasn’t run enough times
  • Previous test runs failed
  • Metrics not being saved between runs
Anomaly detection is influenced by:
  • Detection period (default: 2 days) - the time window being tested
  • Sensitivity (default: 3.0) - how many standard deviations from normal before flagging
  • Training data from previous periods/runs
  • metrics_anomaly_score calculates the anomaly based on the data in data_monitoring metrics.
Check calculations in metrics_anomaly_score:
-- Check how anomalies are being calculated
SELECT
    metric_name,
    latest_metric_value,
    training_avg,
    training_stddev,
    anomaly_score,
    is_anomaly
FROM your_schema.metrics_anomaly_score
WHERE full_table_name = 'your_table'
ORDER BY bucket_end DESC;
  • anomaly_score: The standardized score that measures how many standard deviations a data point is from the mean
  • is_anomaly: A boolean field that indicates whether the anomaly score exceeds the configured threshold
This occurs when there are fewer than 7 training data points. To resolve:

For timestamp-based tests:

  • Check if your timestamp column has enough historical data
  • Verify time buckets are being created correctly in data_monitoring_metrics
  • Look for gaps in your data that might affect bucket creation

For non-timestamp tests:

  • Run your tests multiple times to build up training data.
  • Check data_monitoring_metrics to verify the data collection. The test will need data for at least 7 time buckets (e.g 7 days) to calculate the anomaly.
If your test isn’t appearing in data_monitoring_metrics:Verify test configuration:
data_tests:
  - elementary.volume_anomalies:
      arguments:
        timestamp_column: created_at# Check if specified correctly

Common causes:

  • Incorrect timestamp column name
  • Timestamp column contains null values or is not of type timestamp or date
  • For non-timestamp tests: Test hasn’t run successfully
  • Incorrect test syntax
If you change it after executing elementary tests, you will need to run a full refresh to the metrics collected. This will make the next tests collect data for the new training_period timeframe. The steps are:
  1. Change var training_period in your dbt_project.yml.
  2. Full refresh of the model ‘data_monitoring_metrics’ by running dbt run --select data_monitoring_metrics --full-refresh.
  3. Running the elementary tests again.
If you want the Elementary UI to show data for a longer period of time, use the days-back option of the CLI: edr report --days-back 45