Step 1: Build the daily operations mart
This is what Kyaw Zin Oo checks every morning. Build fct_daily_operations -- a mart model that answers: what came in yesterday, from which farmers, at what moisture and grade, how much rice was produced, where it shipped.
The mart pulls from int_daily_operations and aggregates by mill and date. Kyaw Zin Oo wants the two mills side by side: yield, throughput, grade distribution per mill. One row per mill per day.
Watch the grain. A join between the intermediate model and a lookup table can silently inflate the row count if the join key is not unique on both sides. If fct_daily_operations shows 40% more paddy received than the staging tables, a many-to-many join is the likely cause.
Step 2: Build the farmer tracking dimension
Build dim_farmers. Deliveries and payments over time per farmer. This is where the advance payment records from Mill 2 matter -- a farmer who received an advance payment but has not yet delivered paddy should appear in this dimension with the payment recorded.
Kyaw Zin Oo's daughter wants year-over-year comparison for relationship building: which farmers are delivering more or less paddy compared to the same period last year? That comparison depends on having complete farmer history, including advance payments.
Step 3: Add dbt tests
Add tests that verify the pipeline's structural integrity:
- Unique constraints on natural keys (the same MERGE keys you designed in Unit 3)
- Not-null on critical columns (farmer_name, mill_id, mill_date)
- Business logic tests: daily totals should be positive, yield percentages between 0 and 100, payment amounts non-negative
These are row-level tests. They catch individual records that violate constraints. They do not catch batch-level anomalies -- that is what Soda Core handles next.
Step 4: Add Soda Core trend checks
Soda Core catches patterns that row-level tests miss. Configure checks for:
- Row count within expected range of yesterday. If today's load has 50% fewer rows than yesterday, something went wrong with the extraction even though every individual row passes its tests.
- Average paddy weight within reasonable bounds. A sudden shift in average weight suggests a unit conversion error or a data source change.
These trend checks are the "does this batch look normal?" layer. A pipeline that passes all dbt tests but loads half the expected data has no row-level failures -- it has a batch-level failure that only trend analysis catches.
Step 5: Apply meta-prompting for verification
Meta-prompting is directing AI to verify its own output using structured criteria. Not "check this" -- "check this against these specific criteria."
Direct AI to review fct_daily_operations with a structured verification prompt:
Review fct_daily_operations. Check:
1. Does the grain match one-row-per-mill-per-day?
2. Are all calculations using the unified field names from int_daily_operations, not raw source names?
3. Could any join inflate the row count?
AI will produce a review. Your job is to evaluate the review -- did AI actually check each criterion, or did it produce a generic "looks good" response? Meta-prompting is a verification tool, not an oracle. The quality of the verification depends on the specificity of the criteria you provide.
Then verify the mart numbers yourself. Query fct_daily_operations for a day's data. Sum the paddy received. Now sum paddy_weight_kg directly from the staging tables for the same day. The numbers should match within rounding. If they do not, trace the discrepancy.
Message Kyaw Zin Oo with a sample of the morning report. Show him the two mills side by side -- paddy received, grade distribution, yield per mill. He will be pleased to see the numbers he has been asking for.
Check: Query fct_daily_operations for yesterday's data. Does the total paddy received match what you'd calculate by summing the staging tables directly?