training_period
training_period:
period: < time period > # supported periods: day, week, month
count: < number of periods >
The maximal timeframe for which the test will collect data. This timeframe includes the training period and detection period. If a detection delay is defined, the whole training period is being delayed.
- Default: 14 days
- Relevant tests: Anomaly detection tests with
timestamp_column
How it works?
The training_period
param only works for tests that have timestamp_column
configuration.
It works differently according to the table materialization:
- Regular tables and views - The values of the full
training_period
period is calculated on each run. - Incremental models and sources - The values of the full
training_period
period is calculated on the first test run, and on full refresh. The following test runs will only calculate the values of thedetection_period
period.
Changes from default:
- Full time buckets - Elementary will increase the
training_period
automatically to insure full time buckets. For example if thetime_bucket
of the test isperiod: week
, and 14 daystraining_period
result in Tuesday, the test will collect 2 more days back to complete a week (starting on Sunday). - Seasonality training set - If seasonality is configured, Elementary will increase the
training_period
automatically to ensure there are enough training set values to calculate an anomaly. For example if theseasonality
of the test isday_of_week
,training_period
will be increased to ensure enough Sundays, Mondays, Tuesdays, etc. to calculate an anomaly for each.
The impact of changing training_period
If you increase training_period
your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold.
If you decrease training_period
your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold.