Learn by Directing AI

Step 1: Think about test strategy as architecture

Until now, you've added tests as you built models -- a uniqueness test here, a not-null test there. That works for individual models but doesn't answer the harder question: what catches what, and where?

A quality testing strategy is a deliberate design. Staging tests validate source data against expectations. Intermediate tests verify transformation logic. Mart tests verify business rules and consumer contracts. Each layer catches a different class of failure, and each has a different cost of false positives.

The question is not "how many tests do I have?" but "what failure would I not catch?"

Step 2: Design staging-layer tests

Staging tests are boundary defenses. They catch problems at the point where external data enters your pipeline.

For each staging model, design tests that validate:

Schema presence: expected columns exist (catches field renames from source systems)
Data types: quantities are numeric, dates are dates (catches type changes that produce silent nulls)
Value ranges: delivery quantities are positive, prices are within realistic ranges
Null patterns: columns that should never be null vs columns where nulls are legitimate (billing_status has legitimate nulls)

Direct AI to create these tests. AI commonly generates tests only at the mart layer because that's where the final output lives. But a field rename caught at staging is one failure. The same rename caught at the mart layer is twenty failures -- every downstream model inherited the problem.

Step 3: Design intermediate-layer tests

Intermediate tests verify transformation logic. After joining four factories into unified views:

Join key correctness: no orphaned records (deliveries without matching materials). A LEFT JOIN that drops records means material code resolution failed for some rows.
Material code resolution completeness: every row has a standard material code. Rows with NULL standard codes mean the mapping table was incomplete.
Deduplication verification: no unintended duplicates from the union of four sources.

Step 4: Design mart-layer tests

Mart tests verify business rules and consumer contracts:

Business logic validation: cost attribution totals match source sums. If the sum of fct_cost_attribution.total_kwd doesn't match the sum across all staging models, something was lost or duplicated in transformation.
Freshness constraints: mart tables should be no older than a defined threshold.
Consumer contracts: the columns and types the CFO's reports depend on must be present and valid.

Step 5: Run coverage analysis

After implementing tests across all three layers, run a coverage analysis. Direct AI to report:

Which models have tests? Which don't?
Which models have only structural tests (unique, not_null) but no business logic tests?
Which business rules in Fatimah's requirements are verified by tests? Which are not?

The coverage analysis is professional judgment about risk. 100% coverage with tautological tests (testing that a column is not null when the schema already enforces NOT NULL) is worse than 60% coverage of the things that could actually go wrong.

Identify the gaps. What failure could happen right now that no test would catch?

Step 6: Set up quality metrics tracking

Tests that pass today might become flaky tomorrow. Set up tracking for:

Test pass rates over time
Failure frequency by test (which tests fail most often?)
Flaky test identification (tests that fail intermittently without a clear cause)

A test that fails every Tuesday and gets manually overridden is not a quality gate. It's noise that erodes trust in the entire testing infrastructure. Flaky tests that the team ignores are worse than no tests at all -- they create false confidence.

Step 7: Document the quality strategy

Write a quality strategy document that communicates: what's tested, at what layer, with what coverage, and what remains undefended.

This is professional documentation -- not a list of test names, but a description of the architecture. A new engineer reading this document should understand why specific thresholds were chosen, why certain business rules are tested at the mart layer instead of staging, and what known gaps exist.

✓ Check

Check: Run dbt test and report: how many tests at each layer (staging, intermediate, mart)? At least one business logic test exists at the mart layer (e.g., cost attribution totals match source sums).