dimension_anomalies
elementary.dimension_anomalies
The test counts rows grouped by given dimensions
(columns/expressions).
This test practically monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. It is best to configure it on low-cardinality fields.
If timestamp_column
is configured, the distribution is collected per time_bucket
. If not, it counts the total rows per dimension.
Test configuration
Required configuration: dimensions
tests:
— elementary.dimension_anomalies:
dimensions: sql expression
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
days_back: int
backfill_days: int
min_training_set_size: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.dimension_anomalies:
dimensions: < columns or sql expressions of columns >
# optional - configure a where a expression to accurate the dimension monitoring
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.dimension_anomalies:
dimensions: < columns or sql expressions of columns >
# optional - configure a where a expression to accurate the dimension monitoring
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >