Learn by Directing AI
Unit 5

Add layered quality testing with dbt and Soda Core

Step 1: Add dbt tests

The transformation models are producing results. The re-dye rate matches Roberto's estimate. Now add the defenses that catch problems before they reach the dashboard.

Start with dbt tests on the staging and intermediate models. These are the row-level checks you know from previous projects:

  • Structural tests: unique and not_null on batch_id (the primary key). not_null on line_number, fabric_type, operator_id.
  • Accepted values: line_number in (1, 2, 3). fabric_type in ('polyester', 'cotton_blend').
  • Business logic test: After normalization, the color match score should be between 0 and 100. A custom test that checks this range catches normalization formula errors.

Direct Claude to add these tests. dbt tests verify that individual values are correct at a specific point in time -- "this batch ID is unique," "this score is not negative." They catch specific, row-level violations.

What they do not catch: whether today's batch as a whole is normal. A day where every individual row passes dbt tests but the total batch count is half the expected volume -- because an entire production line went down -- looks fine to dbt. Every row is valid. The batch is anomalous.

Step 2: Understand Soda Core

Soda Core catches the failures that dbt tests cannot. Open materials/soda-core-guide.md and read through it.

Soda Core is a quality monitoring tool that checks batch-level patterns: row counts over time, distributions, statistical bounds. Where dbt tests ask "is this row correct?", Soda Core checks ask "is this batch normal compared to yesterday?"

The two tools serve different roles. dbt tests verify row-level data properties. Soda Core verifies batch-level patterns. Running both is layered defense, not redundancy. A pipeline where all dbt tests pass can still have an anomalous batch that only trend analysis catches. A pipeline where Soda Core shows normal volumes can still have individual rows with invalid values that only dbt tests catch.

Install Soda Core following the guide. Configure it to connect to your DuckDB database.

Step 3: Write Soda Core trend checks

With Soda Core installed, write the check files. You need at least:

  • Row count check: Today's batch count should be within a reasonable range of the expected daily volume.
  • Trend check: The batch count should not deviate more than N% from the previous day.
  • Average score check: The average normalized color match score should fall within a range that reflects normal operations.

Direct Claude to write these checks. Pay attention to the bounds and thresholds Claude proposes.

AI commonly generates statistical bounds from data profiling -- it looks at the current data, calculates the mean and standard deviation, and sets bounds at mean plus-or-minus two standard deviations. This is circular. The bounds describe what the data currently looks like, not what it should look like. If the data is already degraded, the bounds accept degraded data as normal.

The right bounds come from understanding Roberto's business. He told you: three lines, about 30 batches per line per day, so roughly 90 batches total. Batch volumes vary somewhat. Maintenance days (monthly) drop the count because a line goes offline.

Step 4: Calibrate the thresholds

This is where domain knowledge matters most.

Roberto's daily batch count runs between 75 and 95. AI will likely suggest a 20% variance threshold -- a round number that sounds reasonable. With 90 as the baseline, 20% means anything between 72 and 108 passes. That seems fine for normal days.

But Roberto also mentioned maintenance days. On those days, one line is offline and batch volume drops to around 50-60. A 20% threshold would flag every maintenance day as an anomaly. The check would fire monthly, the team would learn to ignore it, and an ignored quality gate is worse than no quality gate at all.

The trade-off: a wider threshold (35%) accommodates maintenance days but might miss a real production problem that drops volume to 65. A tighter threshold (20%) catches real problems but produces monthly false positives. Neither answer is wrong. The right choice depends on what Roberto values more -- catching every anomaly or avoiding false alarms.

Think about what your specific threshold catches and what it misses. If the batch count dropped to 40 (a line went down unexpectedly), would your check catch it? What about a drop to 70?

Step 5: Run the full test suite

Run dbt tests and Soda Core checks together:

dbt test
soda scan -d duckdb -c configuration.yml checks.yml

Both should pass on the current data. The Soda Core output shows something different from dbt's output -- not just pass/fail, but the actual measured values alongside the thresholds.

Review the Soda Core output. Each check shows the measured value, the threshold, and the result. This quantitative context is what trend analysis provides that row-level testing does not.

Step 6: Cross-check the test strategy

Send your test strategy to a second AI model. Ask it to review against Roberto's business requirements. Does the strategy catch the failures Roberto would care about?

Consider: Roberto needs to track re-dye rates over time. He needs to compare quality across lines. He needs to filter by fabric type. Do your quality checks protect these use cases? If the normalization macro produced wrong results, would a test catch it? If a line's data stopped arriving, would a check notice?

The cross-check is not about finding every possible failure. It is about confirming that your defenses cover the failures that would hurt Roberto's business -- a contract review based on wrong numbers, a re-dye rate that looks good because half the data is missing, a quality comparison across lines that uses unconverted temperatures.

✓ Check

✓ Check: If tomorrow's batch count dropped to 40 (half the normal volume because a line went down), would your Soda Core check catch it? What if it dropped to 70?