data-engineering

dbt tests: You'll Catch Data Issues in Minutes, Not Days

Michael Oswald
3 min readBy Michael Oswald
#dbt#testing#data-quality#data-engineering#sql

Your VP just found a duplicate customer in the revenue dashboard. Again. You're scrambling through query history, checking five different models, wondering where the bad data snuck through.

It's 11 PM. You've been debugging for three hours.

Here's what this costs you: Three hours × $75/hour = $225. Twice a month? That's $5,400 yearly just for you. Multiply by your team of four? You're burning $21,600 annually on preventable data fires.

dbt tests catch these issues in seconds, before anyone sees them.

Software engineers validate code with automated tests before deploying to production. dbt brings the same concept to data pipelines, catch bad data before it reaches your dashboards.

The Difference

Without dbt testsWith dbt tests
You discover issues when stakeholders flag themYou catch bad data the moment it enters your pipeline
You spend hours tracing bad data through your pipelineTests fail instantly, showing exactly which rows are wrong
Every deployment makes you nervousYou deploy with confidence

Start with Four Critical Tests

Stop duplicate primary keys:

- name: customer_id
  tests:
    - unique

No more inflated dashboard metrics.

Eliminate nulls in critical fields:

- name: order_date
  tests:
    - not_null

Every row has the data you need.

Catch invalid values:

- name: status
  tests:
    - accepted_values:
        values: ['completed', 'returned', 'placed', 'shipped']

Only valid statuses in your pipeline.

Validate foreign keys:

- name: customer_id
  tests:
    - relationships:
        to: ref('stg_customers')
        field: customer_id

Every order belongs to a real customer.

Run Tests With One Command

dbt test --profiles-dir .

All tests run in parallel. Complete in under 3 seconds. Pass/fail status for each test.

When a test fails, you see the exact failing rows. Fix it in 30 seconds.

Try It Yourself in 5 Minutes

This is my working example on GitHub. Clone it and run these commands:

git clone https://github.com/michael-oswald/dbt-practice
cd dbt-practice
docker-compose up -d

Run the tests:

dbt seed --profiles-dir .
dbt run --profiles-dir .
dbt test --profiles-dir .

You'll see all tests pass. Break one (add a duplicate ID to seeds/raw_customers.csv) and watch the test catch it instantly.

What You Gain

  • Sleep through the night (no more 3 AM data fires)
  • Deploy with confidence (tests catch issues before stakeholders do)
  • Debug in minutes, not hours
  • Save ~$21,600 yearly in wasted engineering time

What it costs: 10 minutes to add tests to each model.

Your next data quality issue is coming this week. Add dbt tests before it arrives.

Enjoyed this post? Get more like it.

Subscribe to get my latest posts about data engineering, AI, and modern data stack delivered to your inbox.