The Brief
Farid Hassan runs Daun & Co, an organic tea company in Kota Kinabalu, Sabah. He grows specialty teas on an estate in Ranau and sources from certified farms in Cameron Highlands. He sells through three channels: a retail shop, wholesale to cafes and hotels, and a growing online store that ships across Southeast Asia.
Revenue has been flat for two years despite adding new blends. Farid suspects wholesale margins are thinner than they should be and online growth may be masking a retail decline. His data lives in three separate systems -- POS, Shopify, and a manually maintained spreadsheet -- and the product names are different in each one. He tried reconciling them himself and gave up.
He needs someone to combine the three data sources, figure out which products and channels are actually profitable, and document what's wrong with his data so he can fix the systems.
Your Role
You're picking up this analysis. Same tools as last time -- Claude Code, DuckDB, Jupyter, the analytics stack. Your job is still to direct AI through the work and verify what comes back.
What's different: nobody is telling you what the metrics should be. Farid's brief gives you clear requirements, but "which products are profitable" requires you to define what profit means. "Where is the growth" requires you to define growth. You write those definitions before any analysis starts.
AI handles the computation. You handle the definitions, the cleaning decisions, and the investigation. When AI says it cleaned the data, you check what it actually did -- how many rows it dropped and why.
What's New
Last time, everything was provided: analysis spec, metric definitions, chart specs, verification targets, prompts. You focused on the loop itself -- profile, compute, chart, verify, communicate.
This time, the brief is clear but the plan is yours. No analysis spec. No metric definitions. No chart types specified. You decide what to define, what to clean, what to compute, and what to show Farid. The data comes from three systems that don't agree on what the products are called, and one of them stores items as free text instead of structured data.
The hard part is the definitions. When you write "channel revenue equals total invoiced amount minus refunds for completed orders," every number in the analysis inherits that decision. If you define it differently, the numbers change and Farid gets a different picture. AI will produce a definition if you ask -- syntactically complete and missing the edge cases that matter.
Tools
- Python 3.11+ (via Miniconda, "analytics" environment)
- DuckDB
- Jupyter Notebook
- pandas
- matplotlib / seaborn
- Claude Code
- Git / GitHub
Materials
- Three data files -- retail sales (POS export), online orders (Shopify export), and wholesale invoices (operations manager's spreadsheet). Each has different columns, different date formats, and different product names.
- Product catalog -- standard product names and cost-per-unit. This is the authoritative reference for reconciling names across the three sources.
- CLAUDE.md -- project governance file with the client context, tech stack, work breakdown, and verification approach.