Skip to main content
This tutorial will add a few common tests to our data set. To read more about Elementary’s suite of tests click here.
There are 3 models in our tutorial:
  • Customers
  • Orders
  • Returned Orders
In Elementary Data, tests are added to the schema.yml file. We will now navigate there and add tests for each of our models.
A schema.yml file that includes all the below tests can be found at the bottom of this page, so there is no need to manually add each test.

First, we will use the tables_anomalies test to perform a row_count. This tests counts the number of rows created in a given period to determine if there have been any anomalies in the number of signups in a given time period.We will add the volume_anomalies test as follows:
models:
  - name: customers
    description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
    config:
      tags: ["PII"]
    data_tests:
      - elementary.volume_anomalies
Now that we have configured a test, we should define a column to use for our timestamp. This will be used to create time buckets for anomaly detection. We select the signup_date column as seen below:
models:
  - name: customers
    description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
    config:
      tags: ["PII"]
      elementary:
        timestamp_column: "signup_date"
    data_tests:
      - elementary.volume_anomalies
This test will fail if there are any days (as defined by signup_date) where the number of rows exceeds 3 standard deviations above/below the mean.

Similar to Test 1, we will use the volume_anomalies test to detect an anomalous number of returned orders in a given time period. In this test, however, we will define the timestamp column at the test level - instead of at the model level.
  - name: returned_orders
    description: This table contains all of the returned orders
    config:
      tags: ["finance"]

    data_tests:
      - elementary.volume_anomalies:
          config:
            tags: ["table_anomalies"]
          arguments:
            timestamp_column: "order_date"


The elementary.dimension_anomalies tests can be used to check for anomalies in the distribution of data in low cardinality fields. First, we will select the status column in the Orders model to run this test on:
  - name: orders
    description: This table has basic information about orders, as well as some derived facts based on payments
    config:
      tags: ["finance"]

    data_tests:
      - elementary.dimension_anomalies:
          arguments:
            dimensions:
              - status
Next, we will define a timestamp column for determining time buckets:
- name: orders
  description: This table has basic information about orders, as well as some derived facts based on payments
  config:
    tags: ["finance"]
    elementary:
      timestamp_column: "order_date"

  data_tests:
    - elementary.dimension_anomalies:
        arguments:
          dimensions:
            - status
This test will gather row count metrics for each value in the status column and will fail if the distribution deviates substantially from the mean.

We will use the elementary.column_anomalies test to monitor the count of customers with 0 orders. To do so, we will define a zero_count test under elementary.column_anomalies for the number_of_orders column:
      - name: number_of_orders
        description: Count of the number of orders a customer has placed
        data_tests:
          - elementary.column_anomalies:
              config:
                severity: warn
                tags: ["column_anomalies"]
              arguments:
                column_anomalies:
                  - zero_count
                timestamp_column: "signup_date"
Notice that we already defined the timestamp_column at the model level, so we don’t have to define it in the test. This page has in-depth details on test priorities.

Copy the text below into your schema.yml file to add all of the above tests to our project:
version: 2

models:
  - name: customers
    description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
    config:
      tags: ["PII"]
      elementary:
        timestamp_column: "signup_date"
    data_tests:
      - elementary.volume_anomalies

    columns:
      - name: customer_id
        description: This is a unique identifier for a customer

      - name: first_name
        description: Customer's first name. PII.

      - name: last_name
        description: Customer's last name. PII.

      - name: first_order
        description: Date (UTC) of a customer's first order

      - name: most_recent_order
        description: Date (UTC) of a customer's most recent order

      - name: number_of_orders
        description: Count of the number of orders a customer has placed
        data_tests:
          - elementary.column_anomalies:
              config:
                severity: warn
                tags: ["column_anomalies"]
              arguments:
                column_anomalies:
                  - zero_count
                timestamp_column: "signup_date"

      - name: customer_lifetime_value
        description: Total value (AUD) of a customer's orders

      - name: customer_email
        description: Customer's email. PII.

      - name: signup_date
        description: Date (UTC) of a customer's signup to the online shop.

  - name: orders
    description: This table has basic information about orders, as well as some derived facts based on payments
    config:
      tags: ["finance"]
      elementary:
        timestamp_column: "order_date"

    data_tests:
      - elementary.dimension_anomalies:
          arguments:
            dimensions:
              - status
    columns:
      - name: order_id
        description: This is a unique identifier for an order

      - name: customer_id
        description: Foreign key to the customers table

      - name: order_date
        description: Date (UTC) that the order was placed

      - name: status
        description: '{{ doc("orders_status") }}'

      - name: amount
        description: Total amount (AUD) of the order

      - name: credit_card_amount
        description: Amount of the order (AUD) paid for by credit card

      - name: coupon_amount
        description: Amount of the order (AUD) paid for by coupon

      - name: bank_transfer_amount
        description: Amount of the order (AUD) paid for by bank transfer

      - name: gift_card_amount
        description: Amount of the order (AUD) paid for by gift card

  - name: returned_orders
    description: This table contains all of the returned orders
    config:
      tags: ["finance"]

    data_tests:
      - elementary.volume_anomalies:
          config:
            tags: ["table_anomalies"]
          arguments:
            timestamp_column: "order_date"

    columns:
      - name: order_id
        description: This is a unique identifier for an order

      - name: customer_id
        description: Foreign key to the customers table

      - name: order_date
        description: Date (UTC) that the order was placed

      - name: status
        description: '{{ doc("orders_status") }}'

      - name: amount
        description: Total amount (AUD) of the order

      - name: credit_card_amount
        description: Amount of the order (AUD) paid for by credit card

      - name: coupon_amount
        description: Amount of the order (AUD) paid for by coupon

      - name: bank_transfer_amount
        description: Amount of the order (AUD) paid for by bank transfer

      - name: gift_card_amount
        description: Amount of the order (AUD) paid for by gift card

You have now added Elementary tests to your project. Now let’s see how they are run and visualized.
I