Golden Ayeyarwady Rice Mill -- Data Pipeline
Project
Building a daily operational reporting pipeline for U Kyaw Zin Oo, Managing Director of Golden Ayeyarwady Rice Mill in Pathein, Myanmar. Two mills with combined 200 metric tons/day paddy processing capacity. The pipeline loads daily data from both mills incrementally, handles corrections without creating duplicates, and produces morning operational reports.
Client
U Kyaw Zin Oo needs accurate morning numbers: paddy received, rice produced, shipments dispatched, by mill. Corrections happen frequently (mill supervisors re-export the full day's data when they find errors). He also wants mill-to-mill comparison and farmer payment tracking over time.
Tech stack
- dbt Core with DuckDB adapter -- transformation engine
- DuckDB -- local analytical database
- Dagster -- orchestration and scheduling
- Soda Core -- data quality monitoring
- GitHub Actions -- CI/CD
- Python -- generation scripts and utilities
- Git/GitHub -- version control
Data dictionary
Mill 1 (CSV export)
| Field | Type | Description |
|---|---|---|
| record_id | integer | Sequential record identifier |
| farmer_name | string | Name of farmer delivering paddy |
| paddy_weight_kg | float | Weight of paddy delivered in kilograms |
| moisture_pct | float | Moisture content percentage (target: 14-16%) |
| grade | string | Rice grade: premium, standard, or low |
| price_mmk | integer | Price paid in Myanmar Kyat |
| mill_date | date | Date of milling operation |
| intake_time | timestamp | Exact time of paddy intake |
Mill 2 (JSON export)
| Field | Type | Description |
|---|---|---|
| id | integer | Sequential record identifier |
| supplier_name | string | Name of supplier (same as farmer_name in Mill 1) |
| weight_kg | float | Weight of paddy in kilograms (null for advance payments) |
| moisture_percent | float | Moisture content percentage (null for advance payments) |
| harvest_quality | string | Quality grade: A (premium), B (standard), C (low). Null for advance payments |
| payment_amount | integer | Payment in Myanmar Kyat |
| processing_date | date | Date of processing (ISO format) |
| timestamp | timestamp | Exact time (ISO format) |
Field mapping (Mill 1 -> Mill 2 -> Unified)
| Mill 1 | Mill 2 | Unified name |
|---|---|---|
| farmer_name | supplier_name | farmer_name |
| paddy_weight_kg | weight_kg | paddy_weight_kg |
| moisture_pct | moisture_percent | moisture_pct |
| grade | harvest_quality | grade (map A->premium, B->standard, C->low) |
| price_mmk | payment_amount | price_mmk |
| mill_date | processing_date | mill_date |
| intake_time | timestamp | intake_time |
Known quality issues
- Corrections as re-exports: When Mill 1 finds an error, the supervisor re-exports the entire day's file. The corrected file contains all records for that day -- both correct and corrected ones. There is no flag distinguishing corrections from originals. Loading without MERGE will create duplicates.
- Advance payments: Mill 2 includes advance payment records where a farmer was paid for future paddy delivery. These have payment_amount but null weight_kg and harvest_quality. They are normal business operations, not errors.
- Format divergence: Mill 1 exports CSV, Mill 2 exports JSON. Field names differ (see mapping above). Grade encoding differs (text vs letter code).
dbt naming conventions
- Staging models:
stg_mill1_daily,stg_mill2_daily(source-conform, no business logic) - Intermediate models:
int_daily_operations(unified view from both mills) - Fact models:
fct_daily_operations(daily aggregated report) - Dimension models:
dim_farmers(farmer tracking over time)
Verification targets
- No duplicate records in staging after correction re-load (MERGE working)
- Mill 1 + Mill 2 combined totals match int_daily_operations
- fct_daily_operations totals match staging totals (no join inflation)
- Watermark advances on each incremental run
- Pre-commit hook blocks commits with failing dbt tests
Commit convention
Commit after each meaningful unit of work: project setup, each dbt model, each test suite addition, each infrastructure component (hooks, monitoring). Messages describe what changed and why.