min_training_set_size: [int]

The minimal amount of data points a test requires for calculating and detecting an anomaly. It’s recommended not to configure a value smaller than 14, so the result could be statistically significant.

  • Default: 14
  • Relevant tests: All anomaly detection tests
min_training_set_size change impact

min_training_set_size change impact

models:
  - name: this_is_a_model
    tests:
      - elementary.volume_anomalies:
          min_training_set_size: 20

How it works?

If the test won’t have at least min_training_set_size it will pass, as there isn’t enough data to determine if there is an anomaly. The Elementary report will show a message saying “Not enough data to calculate anomaly score” instead of a graph.

The impact of changing min_training_set_size

If you increase min_training_set_size your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold.

If you decrease min_training_set_size your test training set will be smaller. This means a smaller sample size for calculating the expected range, which might make the test more sensitive to outliers. This means more chance of false positive anomalies, but also more sensitivity as anomalies have a lower threshold.