Data anomaly detection dbt tests
Elementary dbt package includes anomaly detection tests, implemented as dbt tests. These tests can detect anomalies in volume, freshness, null rates, and anomalies in specific dimensions, among others. The tests are configured and executed like any other tests in your project.Table (model / source) tests
- Volume anomalies
elementary.volume_anomalies Monitors the row count of your table over
time per time bucket (if configured without timestamp_column, will count
table total rows).- Freshness anomalies
elementary.freshness_anomalies Monitors the freshness of your table
over time, as the expected time between data updates. Requires a
timestamp_column
configuration.- Event freshness anomalies
elementary.event_freshness_anomalies Monitors the freshness of event
data over time, as the expected time it takes each event to load - that is,
the time between when the event actually occurs (the event timestamp), and
when it is loaded to the database (the update timestamp). Configuring
event_timestamp_column is required, and update_timestamp_column is
optional.- Dimension anomalies
elementary.dimension_anomalies This test monitors the frequency of
values in the configured dimension over time, and alerts on unexpected changes
in the distribution. It is best to configure it on low-cardinality fields. The
test counts rows grouped by given dimensions (columns/expressions).- All columns anomalies
elementary.all_columns_anomalies Executes column level monitors and
anomaly detection on all the columns of the table. Specific monitors are
detailed here.
You can use column_anomalies param to override the default monitors, and
exclude_prefix / exclude_regexp to exclude columns from the test.Column tests
- Columns anomalies
elementary.column_anomalies Executes column level monitors and anomaly
detection on the column. Specific monitors are detailed
here and can be
configured using the columns_anomalies configuration.Adding tests examples
Configure your elementary anomaly detection tests
If your data set has a timestamp column that represents the creation time of a
field, it is highly recommended configuring it as a
timestamp_column.--select tag:elementary.
If you wish to only be warned on anomalies, configure the severity of the tests to warn.
What happens on each test?
Upon running a test, your data is split into time buckets based on thetime_bucket field and is limited by
the training_period var. The test then compares a certain metric (e.g. row count) of the buckets that are within the
detection-period to the row count of all the previous time buckets within the training_period period.
If there were any anomalies in the detection period, the test will fail.
On each test elementary package executes the relevant monitors, and searches for anomalies by comparing to historical metrics.
To learn more, refer to core concepts.