dimension_anomalies
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.dimension_anomalies:
dimensions: < columns or sql expressions of columns >
# optional - configure a where a expression to accurate the dimension monitoring
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
elementary.dimension_anomalies
The test counts rows grouped by given dimensions
(columns/expressions).
This test practically monitors the frequency of values in the configured dimension over time, and alerts on unexpected changes in the distribution. It is best to configure it on low-cardinality fields.
If timestamp_column
is configured, the distribution is collected per time_bucket
. If not, it counts the total rows per dimension.
Test configuration
Required configuration: dimensions
tests:
— elementary.dimension_anomalies:
dimensions: sql expression
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection-period:
period: [hour | day | week | month]
count: int
training-period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
tests:
- elementary.dimension_anomalies:
dimensions: < columns or sql expressions of columns >
# optional - configure a where a expression to accurate the dimension monitoring
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >