column_anomalies
elementary.column_anomalies
Executes column level monitors and anomaly detection on the column.
Specific monitors are detailed in the table below and can be configured using the columns_anomalies
configuration.
The test checks the data type of the column and only executes monitors that are relevant to it.
Default monitors by type:
Data quality metric | Column Type |
---|---|
null_count | any |
null_percent | any |
min_length | string |
max_length | string |
average_length | string |
missing_count | string |
missing_percent | string |
min | numeric |
max | numeric |
average | numeric |
zero_count | numeric |
zero_percent | numeric |
standard_deviation | numeric |
variance | numeric |
Opt-in monitors by type:
Data quality metric | Column Type |
---|---|
sum | numeric |
Test configuration
No mandatory configuration, however it is highly recommended to configure a timestamp_column
.
tests:
— elementary.column_anomalies:
column_anomalies: column monitors list
dimensions: sql expression
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection_period:
period: [hour | day | week | month]
count: int
training_period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]