Getting Started
- Introduction
- Quickstart
- dbt package
Anomaly Detection Monitors
- Monitors overview
- Automated monitors
- Metrics
- Configuration and Feedback
Alerts and Incidents
- Alerts & incidents overview
- Setup & configure alerts
- Incidents
- Incident management
Performance & Cost
Data Governance
Collaboration & Communication
- Data Observability Dashboard
- Data Health Scores
- Data Catalog
Additional features
Guides
- Setup Elementary
- Send alerts
- Environment Syncs Schedule
- Collect jobs data
- dbt source freshness
- Troubleshooting
Integrations
- All integrations
- Data warehouses
- Transformation & Orchestration
- Data visualization
- Reverse ETL
- Code repositories
- Alerts & Incidents
- Data Catalog
Adding Elementary Tests
This tutorial will add a few common tests to our data set.
To read more about Elementary’s suite of tests click here.
There are 3 models in our tutorial:
- Customers
- Orders
- Returned Orders
In Elementary Data, tests are added to the schema.yml
file. We will now navigate there and add tests for each of our models.
A schema.yml
file that includes all the below tests can be found at the bottom of this page, so there is no need to manually add each test.
First, we will use the tables_anomalies test to perform a row_count. This tests counts the number of rows created in a given period to determine if there have been any anomalies in the number of signups in a given time period.
We will add the volume_anomalies test as follows:
models:
- name: customers
description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
config:
tags: ["PII"]
tests:
- elementary.volume_anomalies
Now that we have configured a test, we should define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the signup_date column as seen below:
models:
- name: customers
description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
config:
tags: ["PII"]
elementary:
timestamp_column: "signup_date"
tests:
- elementary.volume_anomalies
This test will fail if there are any days (as defined by signup_date) where the number of rows exceeds 3 standard deviations above/below the mean.
Similar to Test 1, we will use the volume_anomalies test to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level.
- name: returned_orders
description: This table contains all of the returned orders
config:
tags: ["finance"]
tests:
- elementary.volume_anomalies:
tags: ["table_anomalies"]
timestamp_column: "order_date"
The elementary.dimension_anomalies tests can be used to check for anomalies in the distribution of data in low cardinality fields. First, we will select the status column in the Orders model to run this test on:
- name: orders
description: This table has basic information about orders, as well as some derived facts based on payments
config:
tags: ["finance"]
tests:
- elementary.dimension_anomalies:
dimensions:
- status
Next, we will define a timestamp column for determining time buckets:
- name: orders
description: This table has basic information about orders, as well as some derived facts based on payments
config:
tags: ["finance"]
elementary:
timestamp_column: "order_date"
tests:
- elementary.dimension_anomalies:
dimensions:
- status
This test will gather row count metrics for each value in the status column and will fail if the distribution deviates substantially from the mean.
We will use the elementary.column_anomalies test to monitor the count of customers with 0 orders. To do so, we will define a zero_count test under elementary.column_anomalies for the number_of_orders column:
- name: number_of_orders
description: Count of the number of orders a customer has placed
tests:
- elementary.column_anomalies:
config:
severity: warn
tags: ["column_anomalies"]
column_anomalies:
- zero_count
timestamp_column: "signup_date"
Notice that we already defined the timestamp_column at the model level, so we don’t have to define it in the test. This page has in-depth details on test priorities.
Copy the text below into your schema.yml
file to add all of the above tests to our project:
version: 2
models:
- name: customers
description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
config:
tags: ["PII"]
elementary:
timestamp_column: "signup_date"
tests:
- elementary.volume_anomalies
columns:
- name: customer_id
description: This is a unique identifier for a customer
- name: first_name
description: Customer's first name. PII.
- name: last_name
description: Customer's last name. PII.
- name: first_order
description: Date (UTC) of a customer's first order
- name: most_recent_order
description: Date (UTC) of a customer's most recent order
- name: number_of_orders
description: Count of the number of orders a customer has placed
tests:
- elementary.column_anomalies:
config:
severity: warn
tags: ["column_anomalies"]
column_anomalies:
- zero_count
timestamp_column: "signup_date"
- name: customer_lifetime_value
description: Total value (AUD) of a customer's orders
- name: customer_email
description: Customer's email. PII.
- name: signup_date
description: Date (UTC) of a customer's signup to the online shop.
- name: orders
description: This table has basic information about orders, as well as some derived facts based on payments
config:
tags: ["finance"]
elementary:
timestamp_column: "order_date"
tests:
- elementary.dimension_anomalies:
dimensions:
- status
columns:
- name: order_id
description: This is a unique identifier for an order
- name: customer_id
description: Foreign key to the customers table
- name: order_date
description: Date (UTC) that the order was placed
- name: status
description: '{{ doc("orders_status") }}'
- name: amount
description: Total amount (AUD) of the order
- name: credit_card_amount
description: Amount of the order (AUD) paid for by credit card
- name: coupon_amount
description: Amount of the order (AUD) paid for by coupon
- name: bank_transfer_amount
description: Amount of the order (AUD) paid for by bank transfer
- name: gift_card_amount
description: Amount of the order (AUD) paid for by gift card
- name: returned_orders
description: This table contains all of the returned orders
config:
tags: ["finance"]
tests:
- elementary.volume_anomalies:
tags: ["table_anomalies"]
timestamp_column: "order_date"
columns:
- name: order_id
description: This is a unique identifier for an order
- name: customer_id
description: Foreign key to the customers table
- name: order_date
description: Date (UTC) that the order was placed
- name: status
description: '{{ doc("orders_status") }}'
- name: amount
description: Total amount (AUD) of the order
- name: credit_card_amount
description: Amount of the order (AUD) paid for by credit card
- name: coupon_amount
description: Amount of the order (AUD) paid for by coupon
- name: bank_transfer_amount
description: Amount of the order (AUD) paid for by bank transfer
- name: gift_card_amount
description: Amount of the order (AUD) paid for by gift card
You have now added Elementary tests to your project. Now let’s see how they are run and visualized.