elementary.all_columns_anomalies

Executes column level monitors and anomaly detection on all the columns of the table. Specific monitors are detailed in the table below and can be configured using the columns_anomalies configuration.

The test checks the data type of each column and only executes monitors that are relevant to it. You can use column_anomalies param to override the default monitors, and exclude_prefix / exclude_regexp to exclude columns from the test.

Default monitors by type:

Data quality metricColumn Type
null_countany
null_percentany
min_lengthstring
max_lengthstring
average_lengthstring
missing_countstring
missing_percentstring
minnumeric
maxnumeric
zero_countnumeric
zero_percentnumeric
standard_deviationnumeric
variancenumeric

Opt-in monitors by type:

Data quality metricColumn Type
sumnumeric

Test configuration

No mandatory configuration, however it is highly recommended to configure a timestamp_column.

tests:   — elementary.all_columns_anomalies:     timestamp_column: column name     column_anomalies: column monitors list     exclude_prefix: string     exclude_regexp: regex     where_expression: sql expression     anomaly_sensitivity: int     anomaly_direction: [both | spike | drop]     days_back: int     backfill_days: int     min_training_set_size: int     time_bucket:       period: [hour | day | week | month]       count: int     seasonality: day_of_week

models:
  - name: < model name >
    config:
      elementary:
        timestamp_column: < timestamp column >
    tests:
      - elementary.all_columns_anomalies:
          column_anomalies: < specific monitors, all if null >
          where_expression: < sql expression >
          time_bucket: # Daily by default
            period: < time period >
            count: < number of periods >