column_anomalies
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
- name: < model name >
## if no timestamp is configured, elementary will monitor without time filtering
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >
elementary.column_anomalies
Executes column level monitors and anomaly detection on the column.
Specific monitors are detailed in the table below and can be configured using the columns_anomalies
configuration.
The test checks the data type of the column and only executes monitors that are relevant to it.
Default monitors by type:
Data quality metric | Column Type |
---|---|
null_count | any |
null_percent | any |
min_length | string |
max_length | string |
average_length | string |
missing_count | string |
missing_percent | string |
min | numeric |
max | numeric |
zero_count | numeric |
zero_percent | numeric |
standard_deviation | numeric |
variance | numeric |
Opt-in monitors by type:
Data quality metric | Column Type |
---|---|
sum | numeric |
Test configuration
No mandatory configuration, however it is highly recommended to configure a timestamp_column
.
tests:
— elementary.column_anomalies:
column_anomalies: column monitors list
timestamp_column: column name
where_expression: sql expression
anomaly_sensitivity: int
anomaly_direction: [both | spike | drop]
detection-period:
period: [hour | day | week | month]
count: int
training-period:
period: [hour | day | week | month]
count: int
time_bucket:
period: [hour | day | week | month]
count: int
seasonality: day_of_week
detection_delay:
period: [hour | day | week | month]
count: int
ignore_small_changes:
spike_failure_percent_threshold: int
drop_failure_percent_threshold: int
anomaly_exclude_metrics: [SQL expression]
models:
- name: < model name >
config:
elementary:
timestamp_column: < timestamp column >
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >
time_bucket: # Daily by default
period: < time period >
count: < number of periods >
- name: < model name >
## if no timestamp is configured, elementary will monitor without time filtering
columns:
- name: < column name >
tests:
- elementary.column_anomalies:
column_anomalies: < specific monitors, all if null >
where_expression: < sql expression >