Learn by Directing AI
Unit 5

Build the mart layer and morning reports

Step 1: Build the daily operations mart

This is what Kyaw Zin Oo checks every morning. Build fct_daily_operations -- a mart model that answers: what came in yesterday, from which farmers, at what moisture and grade, how much rice was produced, where it shipped.

The mart pulls from int_daily_operations and aggregates by mill and date. Kyaw Zin Oo wants the two mills side by side: yield, throughput, grade distribution per mill. One row per mill per day.

Watch the grain. A join between the intermediate model and a lookup table can silently inflate the row count if the join key is not unique on both sides. If fct_daily_operations shows 40% more paddy received than the staging tables, a many-to-many join is the likely cause.

Step 2: Build the farmer tracking dimension

Build dim_farmers. Deliveries and payments over time per farmer. This is where the advance payment records from Mill 2 matter -- a farmer who received an advance payment but has not yet delivered paddy should appear in this dimension with the payment recorded.

Kyaw Zin Oo's daughter wants year-over-year comparison for relationship building: which farmers are delivering more or less paddy compared to the same period last year? That comparison depends on having complete farmer history, including advance payments.

Step 3: Add dbt tests

Add tests that verify the pipeline's structural integrity:

  • Unique constraints on natural keys (the same MERGE keys you designed in Unit 3)
  • Not-null on critical columns (farmer_name, mill_id, mill_date)
  • Business logic tests: daily totals should be positive, yield percentages between 0 and 100, payment amounts non-negative

These are row-level tests. They catch individual records that violate constraints. They do not catch batch-level anomalies -- that is what Soda Core handles next.

Step 4: Add Soda Core trend checks

Soda Core catches patterns that row-level tests miss. Configure checks for:

  • Row count within expected range of yesterday. If today's load has 50% fewer rows than yesterday, something went wrong with the extraction even though every individual row passes its tests.
  • Average paddy weight within reasonable bounds. A sudden shift in average weight suggests a unit conversion error or a data source change.

These trend checks are the "does this batch look normal?" layer. A pipeline that passes all dbt tests but loads half the expected data has no row-level failures -- it has a batch-level failure that only trend analysis catches.

Step 5: Apply meta-prompting for verification

Meta-prompting is directing AI to verify its own output using structured criteria. Not "check this" -- "check this against these specific criteria."

Direct AI to review fct_daily_operations with a structured verification prompt:

Review fct_daily_operations. Check:
1. Does the grain match one-row-per-mill-per-day?
2. Are all calculations using the unified field names from int_daily_operations, not raw source names?
3. Could any join inflate the row count?

AI will produce a review. Your job is to evaluate the review -- did AI actually check each criterion, or did it produce a generic "looks good" response? Meta-prompting is a verification tool, not an oracle. The quality of the verification depends on the specificity of the criteria you provide.

Then verify the mart numbers yourself. Query fct_daily_operations for a day's data. Sum the paddy received. Now sum paddy_weight_kg directly from the staging tables for the same day. The numbers should match within rounding. If they do not, trace the discrepancy.

Message Kyaw Zin Oo with a sample of the morning report. Show him the two mills side by side -- paddy received, grade distribution, yield per mill. He will be pleased to see the numbers he has been asking for.

✓ Check

Check: Query fct_daily_operations for yesterday's data. Does the total paddy received match what you'd calculate by summing the staging tables directly?