Verification Checklist — Mel do Sofala Pipeline
Check your pipeline output against these values at every stage. If the numbers do not match, something is wrong — investigate before moving on.
Source verification
- Total row count across all 12 CSV files: 4,809 rows
- Distinct beekeepers: 170
- Distinct collection points: 6 (Buzi, Chibabava, Dondo, Gorongosa North, Gorongosa South, Marromeu)
Staging verification
- Staging row count must match source exactly: 4,809 rows. No records lost, no records added.
Spot-check these three records in the staging table (after column renaming and weight conversion):
| Source file | Beekeeper | Date | Weight (kg) | Grade | Price (MZN/kg) |
|---|---|---|---|---|---|
| gorongosa-north-q3.csv | Rafael Machel | 2024-08-28 | 3.99 | C | 188.81 |
| buzi-q1.csv | Celestino Gove | 2024-03-20 | 6.42 | A | 345.63 |
| dondo-q1.csv | Simao Chibanga | 2024-03-04 | 11.30 | C | 185.82 |
The Buzi record originated in pounds (the source file shows weight_lbs). After conversion to kilograms, the value should be approximately 6.42 kg. If your staging table shows a weight over 14 for this record, the pound-to-kilo conversion did not happen.
Mart verification
- Expected row count after correct deduplication: 4,650 rows (deduplicated on beekeeper_name + collection_date + collection_point)
- Row count if AI deduplicates on beekeeper name only: 170 rows. If you see this number, the deduplication key is wrong — legitimate separate deliveries were merged. Fix the key to use the composite (beekeeper_name + collection_date + collection_point).
Revenue top-3 collection points (total_value = weight_kg * price_per_kg):
| Rank | Collection point | Total revenue (MZN) |
|---|---|---|
| 1 | Buzi | ~2,497,224 |
| 2 | Marromeu | ~1,904,096 |
| 3 | Dondo | ~1,588,900 |
Quality grade distribution (after correct dedup, excluding nulls):
| Grade | Count | Percentage |
|---|---|---|
| A | 1,411 | 30.3% |
| B+ | 238 | 5.1% |
| B | 1,408 | 30.3% |
| C | 1,593 | 34.3% |
Idempotency verification
Run the entire pipeline a second time. The row counts must match the first run exactly:
- Staging: 4,809
- Mart: 4,650
If the mart table shows 9,300 rows (doubled), the pipeline used INSERT instead of a replacement pattern. Fix it to use CREATE OR REPLACE TABLE or equivalent so re-runs produce identical output.
Carlos's addition
Average price per kilo by collection point (for the request Carlos makes after seeing the initial results):
| Collection point | Average price (MZN/kg) |
|---|---|
| Buzi | 256.09 |
| Chibabava | 253.06 |
| Dondo | 251.21 |
| Gorongosa North | 246.70 |
| Gorongosa South | 251.02 |
| Marromeu | 251.10 |