Learn by Directing AI

The Brief

Assel Nurzhanova is the Operations Director at Astana Grain Terminal, a grain storage company in Kazakhstan's Kostanay region. They operate two elevators with a combined capacity of 120,000 tonnes, storing wheat, barley, and flax for export.

Assel's problem is spoilage. She has lost 800 tonnes over the past two years to moisture damage and temperature-related degradation. She suspects this correlates with weather patterns, but right now someone checks a weather website every morning and writes the numbers in a notebook. She needs the weather data pulled automatically and combined with her storage data so she can see which conditions cause spoilage.

The storage system exports CSV files daily from both elevators. She has 18 months of these exports. What she does not have is weather data in a format she can compare against storage readings.

Your Role

You're building the pipeline that combines Assel's storage data with weather data from an API. Load the storage CSVs from both elevators into DuckDB, extract weather data from the Open-Meteo API, design a schema that brings them together, and verify the combined output matches expected values.

This time, you design the schema. Last project, it was provided. This time, you profile the data sources and decide what the target structure should look like: what staging tables you need, what one row represents in the fact table, and how the two sources join.

You'll direct Claude Code through focused, sequential requests. What to ask and in what order is your decision. The pipeline spec tells you what Assel needs. How you decompose the work into tasks for AI is up to you.

What's New

Last time you loaded CSV files and verified row counts against a known target. This time, one of your sources is a live API. It paginates, returns timestamps in a different timezone, and can silently return fewer records than you expect. The "did everything arrive?" question is the same one you asked with Carlos's honey data. The verification technique is different.

The other new piece is schema design. You profiled nothing last time -- the schema was documented and you implemented it. Now you profile both data sources, decide on the grain (what one row represents), and design the staging and mart layers yourself. If the grain is wrong, every downstream number is wrong.

The hard part is the extraction boundary. The API call completes, reports success, and you have data. Whether you have all the data is a separate question -- and the API does not answer it for you.

Tools

Python -- via your Miniconda de environment
DuckDB -- analytical database for both storage and weather data
SQL -- for staging and mart transformations
requests -- Python library for API extraction (new this project)
Claude Code -- your AI agent, doing the implementation work
Git / GitHub -- version control

Materials

You'll receive:

Pipeline specification -- what to build, what Assel needs, verification targets
Storage data -- 6 months of CSV exports from both elevators
Verification checklist -- row counts, staging counts, spoilage-weather correlation spot-checks
Project governance file -- CLAUDE.md with the full ticket breakdown