all_columns_anomalies
elementary.all_columns_anomalies
Executes column level monitors and anomaly detection on all the columns of the table.
Specific monitors are detailed in the table below and can be configured using the columns_anomalies
configuration.
The test checks the data type of each column and only executes monitors that are relevant to it.
You can use column_anomalies
param to override the default monitors, and exclude_prefix
/ exclude_regexp
to exclude columns from the test.
Default monitors by type:
Data quality metric | Column Type |
---|---|
null_count | any |
null_percent | any |
min_length | string |
max_length | string |
average_length | string |
missing_count | string |
missing_percent | string |
min | numeric |
max | numeric |
average | numeric |
zero_count | numeric |
zero_percent | numeric |
standard_deviation | numeric |
variance | numeric |
Opt-in monitors by type:
Data quality metric | Column Type |
---|---|
sum | numeric |
Test configuration
No mandatory configuration, however it is highly recommended to configure a timestamp_column
.
tests:
— elementary.all_columns_anomalies:
timestamp_column: column name
column_anomalies: column monitors list
dimensions: sql expression
exclude_prefix: string
exclude_regexp: regex
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection_period:
period: [hour | day | week | month]
count: int
training_period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]