Elementary OSS and Elementary Cloud Platform both offer data anomaly detection. However, there are significant differences in implementation.

There are two types of anomaly detection tests:

  • Pipeline health monitors - Monitor the pipeline runs, ensuring timely and complete data ingestion and transformation. These monitors monitor metadata to detect volume and freshness issues.

  • Data quality metrics tests - Run as part of the pipeline, collect metrics by querying the data itself. These include various data quality metrics such as nullness, cardinality, average, length, etc.

Here is a comparison between the implementation of these tests in Elementary Cloud and OSS:

Pipeline Health Monitors - Freshness and Volume

OSSCloud
Implementationdbt testsElementary Cloud monitors
Tests executionRun in dbtRun in Cloud
CoverageManually added in codeAutomated, out-of-the-box full coverage
ConfigurationManual, many parameters required for accuracyNo configuration, automated ML models
Detection mechanismZ-score, statisticalML anomaly detection, various models
What is monitored?DataMetadata (query history, information schema)
Time to detectionOnly when dbt runsAs soon as the problem happens, including sources
CostDWH computeNo cost, only metadata is leveraged

Data Quality Metrics

OSSCloud
Implementationdbt testsMetrics collection in dbt, Elementary Cloud monitors
Tests executionRun in dbtMetrics collection in dbt, detection in Cloud
CoverageManually added in codeOpt-in, can be added in bulk in Cloud
ConfigurationManual, many parameters required for accuracyAutomated ML models
Detection mechanismZ-score, statisticalML anomaly detection, various models
What is monitored?DataData