volume_anomalies
elementary.volume_anomalies
Monitors the row count of your table over time per time bucket (if configured without timestamp_column
, will count table total rows).
Upon running the test, your data is split into time buckets (daily by default, configurable with the time bucket
field),
and then we compute the row count per bucket for the last days_back
days (by default 14).
The test then compares the row count of each bucket within the detection period (last 2 days by default, configured as backfill_days
),
and compares it to the row count of the previous time buckets.
The test will only run on completed time buckets, so if you run it with daily buckets in the middle of today, the test would only count yesterday as a complete bucket. If there were any anomalies during the detection period, the test will fail.
Test configuration
No mandatory configuration, however it is highly recommended to configure a timestamp_column
.
tests:
— elementary.volume_anomalies:
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
days_back: int
backfill_days: int
min_training_set_size: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
models:
- name: < model name >
tests:
- elementary.volume_anomalies:
timestamp_column: < timestamp column >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
models:
- name: < model name >
tests:
- elementary.volume_anomalies:
timestamp_column: < timestamp column >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >