dbt tests: You'll Catch Data Issues in Minutes, Not Days
Your VP just found a duplicate customer in the revenue dashboard. Again. You're scrambling through query history, checking five different models, wondering where the bad data snuck through.
It's 11 PM. You've been debugging for three hours.
Here's what this costs you: Three hours × $75/hour = $225. Twice a month? That's $5,400 yearly just for you. Multiply by your team of four? You're burning $21,600 annually on preventable data fires.
dbt tests catch these issues in seconds, before anyone sees them.
Software engineers validate code with automated tests before deploying to production. dbt brings the same concept to data pipelines, catch bad data before it reaches your dashboards.
The Difference
| Without dbt tests | With dbt tests |
|---|---|
| You discover issues when stakeholders flag them | You catch bad data the moment it enters your pipeline |
| You spend hours tracing bad data through your pipeline | Tests fail instantly, showing exactly which rows are wrong |
| Every deployment makes you nervous | You deploy with confidence |
Start with Four Critical Tests
Stop duplicate primary keys:
- name: customer_id
tests:
- unique
No more inflated dashboard metrics.
Eliminate nulls in critical fields:
- name: order_date
tests:
- not_null
Every row has the data you need.
Catch invalid values:
- name: status
tests:
- accepted_values:
values: ['completed', 'returned', 'placed', 'shipped']
Only valid statuses in your pipeline.
Validate foreign keys:
- name: customer_id
tests:
- relationships:
to: ref('stg_customers')
field: customer_id
Every order belongs to a real customer.
Run Tests With One Command
dbt test --profiles-dir .
All tests run in parallel. Complete in under 3 seconds. Pass/fail status for each test.
When a test fails, you see the exact failing rows. Fix it in 30 seconds.
Try It Yourself in 5 Minutes
This is my working example on GitHub. Clone it and run these commands:
git clone https://github.com/michael-oswald/dbt-practice
cd dbt-practice
docker-compose up -d
Run the tests:
dbt seed --profiles-dir .
dbt run --profiles-dir .
dbt test --profiles-dir .
You'll see all tests pass. Break one (add a duplicate ID to seeds/raw_customers.csv) and watch the test catch it instantly.
What You Gain
- Sleep through the night (no more 3 AM data fires)
- Deploy with confidence (tests catch issues before stakeholders do)
- Debug in minutes, not hours
- Save ~$21,600 yearly in wasted engineering time
What it costs: 10 minutes to add tests to each model.
Your next data quality issue is coming this week. Add dbt tests before it arrives.
Enjoyed this post? Get more like it.
Subscribe to get my latest posts about data engineering, AI, and modern data stack delivered to your inbox.