models:
  - name: this_is_a_model
    tests:
      - elementary.volume_anomalies:
          training_period:
            period: day
            count: 30
training_period:
  period: < time period > # supported periods: day, week, month
  count: < number of periods >

The maximal timeframe for which the test will collect data. This timeframe includes the training period and detection period. If a detection delay is defined, the whole training period is being delayed.

  • Default: 14 days
  • Relevant tests: Anomaly detection tests with timestamp_column
Training Period
models:
  - name: this_is_a_model
    tests:
      - elementary.volume_anomalies:
          training_period:
            period: day
            count: 30

How it works?

The training_period param only works for tests that have timestamp_column configuration.

It works differently according to the table materialization:

  • Regular tables and views - The values of the full training_period period is calculated on each run.
  • Incremental models and sources - The values of the full training_period period is calculated on the first test run, and on full refresh. The following test runs will only calculate the values of the detection_period period.

Changes from default:

  • Full time buckets - Elementary will increase the training_period automatically to insure full time buckets. For example if the time_bucket of the test is period: week, and 14 days training_period result in Tuesday, the test will collect 2 more days back to complete a week (starting on Sunday).
  • Seasonality training set - If seasonality is configured, Elementary will increase the training_period automatically to ensure there are enough training set values to calculate an anomaly. For example if the seasonality of the test is day_of_week, training_period will be increased to ensure enough Sundays, Mondays, Tuesdays, etc. to calculate an anomaly for each.

The impact of changing training_period

If you increase training_period your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold.

If you decrease training_period your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold.