The Brief
U Kyaw Zin Oo runs Golden Ayeyarwady Rice Mill in Pathein, Myanmar. Two mills in the Ayeyarwady Delta, 200 metric tons of paddy processed every day, purchased from local farmers and milled into white and parboiled rice for export to Yangon and Thailand.
Every morning he needs the numbers from yesterday. Paddy received, from which farmers, what moisture, what grade, how much rice produced, where it shipped. Both mills have systems that log everything, but the data sits in two different formats and getting it into one place without errors has been the problem. When a correction happens -- a supervisor discovers yesterday's numbers were wrong -- the fix creates more confusion than the original mistake.
He left a voicemail.
Your Role
You're building a pipeline that loads daily data from both mills automatically, handles corrections without creating duplicates, and produces the morning operational reports Kyaw Zin Oo checks before heading to the mills.
Before you write a line of pipeline code, you'll build the infrastructure that makes every AI session on this project better than it would be otherwise. A project memory file that carries the data dictionary, naming conventions, known issues, and design decisions from session to session. The difference between directing AI with that infrastructure and directing AI without it is something you'll see for yourself.
Templates provide structure for the new terrain. Guides are thinner than last time.
What's New
Last time you designed schemas that track change over time -- SCD strategies for Katrine's turbines, MetricFlow for standardized metrics, PII masking across every output surface. The schema design was the hard part.
This time the extraction design is the hard part. The data arrives daily, but loading it isn't as simple as adding each day's file. Corrections mean the same records show up again. Loading them alongside the originals inflates the numbers without any error message. You'll design the extraction pattern that handles this -- and you'll build the AI infrastructure before you build the pipeline.
Meta-prompting for verification enters: directing AI to check its own work using structured criteria, then evaluating whether the check was thorough.
Tools
- dbt Core with DuckDB adapter -- transformation framework, with MERGE patterns (new this project)
- DuckDB -- local analytical database
- Soda Core -- quality monitoring
- Dagster -- orchestration, with watermark monitoring (new this project)
- GitHub Actions -- CI/CD, with pre-commit and pre-push hooks (new this project)
- Claude Code -- AI directing tool, with project memory authoring and meta-prompting for verification (new this project)
- Git / GitHub -- version control
Materials
You'll receive:
- Mill data -- daily export samples from both mills (CSV from Mill 1, JSON from Mill 2), including a correction scenario for testing
- Field mapping -- how the two mills' different field names correspond to each other
- Pipeline spec template -- empty structure for you to fill from what you learn from Kyaw Zin Oo
- Project memory template -- format guide for CLAUDE.md and AGENTS.md with examples of good vs vague entries
- Incremental extraction guide -- conceptual overview of full vs incremental refresh, watermarks, and MERGE patterns