Learn by Directing AI

Step 1: Query the mart for Carlos's questions

Carlos needs three things from this data: production totals by collection point, quality grade distribution across the cooperative, and the ability to look up any beekeeper's history. The pipeline you built and verified can answer all three. Now you need the queries.

Direct Claude Code to produce them:

Write three queries against fct_harvests:
1. Total harvests, total weight in kg, and total revenue by collection point — sorted by revenue descending.
2. Quality grade distribution — count and percentage for each grade.
3. A beekeeper history lookup — all records for a single beekeeper, sorted by date. Use "Amina Cossa" as the example.

Run them. The numbers should match everything you verified in Units 4 and 5 — same row counts, same revenue top-3, same grade distribution. These are not new checks. They are the same verified numbers, presented in the format a cooperative manager can use.

Step 2: Present the results to Carlos

Open the chat with Carlos. Share what the pipeline produced: twelve collection points consolidated into one queryable database, production totals by region, grade breakdowns, and beekeeper-level traceability.

Carlos responds warmly. He has been waiting for this — scattered spreadsheets made it impossible to answer buyer questions about where the honey came from or how much each collection point produced. Now he can.

He pulls up a name — one of the beekeepers in Gorongosa North — and asks about their history. The beekeeper lookup query answers it immediately. He notices production patterns across the collection points that were invisible when the data lived in separate files.

Then he asks: "Could we also calculate the average price we pay per kilo by region? I have been wondering if we are paying fairly across the collection points."

Step 3: Add the average price per kilo

This is a reasonable addition. The data already contains price and weight for every harvest record. The calculation is straightforward — average the price per kilo, grouped by collection point.

Direct Claude Code to add it:

Add average price paid per kilo by collection point to the production summary query. Use price_per_kg from fct_harvests, grouped by collection_point. Round to two decimal places.

Run the updated query and check the result. Each collection point should show a plausible average — the prices will vary by region, which is exactly what Carlos wants to see. If any collection point shows a number that looks wildly different from the others, check whether the weight standardization from the staging layer handled that region's data correctly.

Step 4: Share the updated results with Carlos

Send Carlos the updated output with the average price column included. He can now see which collection points pay more or less per kilo — and whether the differences are large enough to investigate.

Carlos is satisfied. The consolidated database gives him what he needed: traceability for the buyers in Maputo, production visibility across all twelve collection points, and now a view into pricing fairness. He mentions that the buyer certifications will be much easier with this data in one place.

Step 5: Write the README

A repository without a README is a list of files with no explanation. Anyone visiting the repo -- including you, three months from now -- has no idea what the pipeline does or how to run it. The README is the front door.

Direct Claude to write a README for the project: "Write a README.md that describes what this pipeline does, what data it consolidates, how to run it, and what the expected output looks like. Keep it concise."

Review what Claude produces. Does it accurately describe what the pipeline does for Carlos and Mel do Sofala? Does it mention the twelve collection points, the deduplication logic, and how to re-run the pipeline when new data arrives? If the README describes something you did not build, fix it before committing.

Step 6: Commit and push

The pipeline works. The output is verified. The client is satisfied. One thing remains: the work exists only on your machine. A pipeline that is not committed is a pipeline that can be lost -- and that nobody else can review or run.

Direct Claude Code to commit and push:

Commit all the work to git with a clear message describing what the pipeline does. Then push to GitHub.

The commit should include the SQL scripts for staging and mart, the source data in materials/, and any other files the pipeline needs to run. The commit message should describe what the pipeline does and for whom — not just "add files" but something that tells a future reader what this project built.

Review the commit before pushing. Does the repository contain everything someone would need to understand and run this pipeline? The SQL scripts, the source data, the verification checklist, the project documentation. If anything is missing, add it and amend the commit.

Step 7: Close the project

Push to GitHub. The repository now contains a working, verified pipeline that consolidates harvest data from twelve collection points into a single queryable database for Mel do Sofala.

Check what the repository looks like from the outside:

git log --oneline

The commit history should tell the story of the work — loading data, building staging, building the mart, fixing the deduplication key, fixing the idempotency issue, adding the average price query. A reader who looks at the log should understand what was built and how it progressed.

That is a completed project. Scattered spreadsheets are now consolidated, verified data. Carlos can answer buyer questions about traceability. The pipeline can be re-run safely when next quarter's data arrives.

Check your understanding: The repository contains: SQL scripts for staging and mart, the source data, a README that describes the pipeline, and a commit history that shows the progression of work.