One of the challenges data teams face is tracking and understand and collaborate on the status of data issues. Tests fail daily, pipelines are executed frequently, alerts are sent to different channels. There is a need for a centralized place to track:

  • What data issues are open? Which issues were already resolved?
  • Who is on it, and what’s the latest status?
  • Are multiple failures part of the same issue?
  • What actions and events happened since the incident started?
  • Did such issue happen before? Who resolved it and how?

In Elementary, these are solved with Incidents.

A comprehensive view of all incidents can be found in the Incidents page.

How incidents work?

Every failure or warning in Elementary will automatically open a new incident or be added as an event to an ongoing incident. Based on grouping rules, different failures are grouped to the same incident.

An incident has a status, assignee and severity. These can be set in the Incidents page, or from an alert in integrations that support alert actions.

Elementary Incidents

How incidents are resolved?

Each incident starts at the first failure, and ends when the status is changed manually or automatically to Resolved. An incident is automatically resolved when the failing tests, monitors and / or models are successful again.

Incident grouping rules

Different failures and warnings are grouped to the same incident by the following grouping rules:

  1. Additional failures of the same test / monitor on a table that has an active incident.
  2. _ Coming soon _ Freshness and volume issues that are downstream of an open incident on a model failure.
  3. _ Coming soon _ Failures of the same test / monitor that are on downstream tables of an active incident.

Incident deep dive

_ Coming soon _