Learn by Directing AI
All materials

verification-checklist.md

Verification Checklist — Mel do Sofala Pipeline

Check your pipeline output against these values at every stage. If the numbers do not match, something is wrong — investigate before moving on.

Source verification

  • Total row count across all 12 CSV files: 4,809 rows
  • Distinct beekeepers: 170
  • Distinct collection points: 6 (Buzi, Chibabava, Dondo, Gorongosa North, Gorongosa South, Marromeu)

Staging verification

  • Staging row count must match source exactly: 4,809 rows. No records lost, no records added.

Spot-check these three records in the staging table (after column renaming and weight conversion):

Source file Beekeeper Date Weight (kg) Grade Price (MZN/kg)
gorongosa-north-q3.csv Rafael Machel 2024-08-28 3.99 C 188.81
buzi-q1.csv Celestino Gove 2024-03-20 6.42 A 345.63
dondo-q1.csv Simao Chibanga 2024-03-04 11.30 C 185.82

The Buzi record originated in pounds (the source file shows weight_lbs). After conversion to kilograms, the value should be approximately 6.42 kg. If your staging table shows a weight over 14 for this record, the pound-to-kilo conversion did not happen.

Mart verification

  • Expected row count after correct deduplication: 4,650 rows (deduplicated on beekeeper_name + collection_date + collection_point)
  • Row count if AI deduplicates on beekeeper name only: 170 rows. If you see this number, the deduplication key is wrong — legitimate separate deliveries were merged. Fix the key to use the composite (beekeeper_name + collection_date + collection_point).

Revenue top-3 collection points (total_value = weight_kg * price_per_kg):

Rank Collection point Total revenue (MZN)
1 Buzi ~2,497,224
2 Marromeu ~1,904,096
3 Dondo ~1,588,900

Quality grade distribution (after correct dedup, excluding nulls):

Grade Count Percentage
A 1,411 30.3%
B+ 238 5.1%
B 1,408 30.3%
C 1,593 34.3%

Idempotency verification

Run the entire pipeline a second time. The row counts must match the first run exactly:

  • Staging: 4,809
  • Mart: 4,650

If the mart table shows 9,300 rows (doubled), the pipeline used INSERT instead of a replacement pattern. Fix it to use CREATE OR REPLACE TABLE or equivalent so re-runs produce identical output.

Carlos's addition

Average price per kilo by collection point (for the request Carlos makes after seeing the initial results):

Collection point Average price (MZN/kg)
Buzi 256.09
Chibabava 253.06
Dondo 251.21
Gorongosa North 246.70
Gorongosa South 251.02
Marromeu 251.10