Learn by Directing AI
Unit 6

Show Mihai and close the project

Step 1: Query the mart for Mihai's questions

Mihai wants three things: profitability by variety, yield per variety, and a quarterly summary for his accountant. The mart already has most of this. Start with the core question -- which cheese makes money?

SELECT
  variety,
  total_milk_cost,
  total_revenue,
  yield,
  profit_margin
FROM fct_variety_profitability
ORDER BY profit_margin DESC;

Six rows, ordered from most to least profitable. The ordering makes the answer immediate -- cas at the top, urda at the bottom. Scan the numbers against what you verified in Unit 5. These are the same values you checked against materials/verification-checklist.md, so the profitability should already be correct.

Now the quarterly summary. Direct Claude to produce it:

Query the DuckDB database to produce a quarterly revenue summary from stg_sales -- total revenue per quarter, broken down by variety. Use date_trunc('quarter', sale_date) for the grouping. Order by quarter, then by revenue descending within each quarter.

Review the output. Mihai's accountant needs this for tax reporting -- the quarterly totals should be consistent with the annual revenue figures in the profitability mart.

Step 2: Present results to Mihai

Open the Slack channel with Mihai. Share the profitability results and explain what the numbers mean.

The key findings to communicate: cas has the highest margin because it combines good yield with moderate pricing. Telemea is the volume leader -- solid margin across more kilograms sold. Urda sits at the bottom because whey cheese, despite its high yield, sells at a fraction of the price. The aging factor matters most for cascaval and branza de burduf -- longer aging ties up capital and increases cost, which compresses their margins even though they command higher prices per kilo.

Pick a suggested message that presents these findings clearly. Mihai is not a data person -- he thinks in terms of cheese wheels and shepherd relationships, not margins and yields. Translate the numbers into his language: which varieties to make more of, which ones barely break even, and why the accounting didn't show this before (because production cost, sales revenue, and milk purchases lived in three separate spreadsheets with no connection between them).

Mihai responds warmly. He's excited to see the numbers laid out clearly. "I always thought burduf was the most profitable because restaurants pay more, but I didn't account for the yield." The surprise is genuine -- he's been producing burduf in larger quantities based on a wrong assumption about profitability.

Then he asks: "Can we also see which restaurants order which varieties? I want to know if I should make more burduf because Restaurant Horezu keeps ordering it."

Step 3: Manage the scope request

Mihai's question is reasonable. Per-customer-per-variety analysis would tell him which buyers drive demand for specific varieties. The data exists -- stg_sales has both customer_name and variety. The mart could be extended with a fct_customer_variety model.

But this is a scope expansion. The original request was variety-level profitability, and that's what you built and verified. Adding per-customer analysis means new models, new tests, new verification -- and potentially new questions from Mihai once he sees which restaurants order what.

Acknowledge the value of the request. Explain that the current pipeline is verified and complete for the variety-level question. The per-customer analysis belongs in a future phase -- the staging layer already has the data, so extending the pipeline would be straightforward, but it's separate work with its own verification needs.

This is scope management. The instinct to say "sure, I can add that quickly" is strong, especially when the query would take five minutes. But verified work means verified scope. The pipeline you built has been tested, checked against known values, and confirmed idempotent. Extending it without the same rigor creates a gap between the verified core and the unverified addition.

Step 4: Write the pipeline summary

Direct Claude to write a pipeline summary document:

Write a README.md for the branzeria_carpati dbt project. Include: what the pipeline does (connects three data sources to produce variety-level profitability for an artisan cheese operation), what sources it reads (production log, sales, milk purchases), the dbt project structure (staging models, mart model, source definitions), what the tests verify (uniqueness, nulls, accepted values, referential integrity), and what the profitability output means (which varieties are most/least profitable and why). Keep it concise -- this is for someone who opens the project six months from now and needs to understand what it does.

Review what Claude produces. The summary should be accurate to what you actually built -- not what was planned, but what exists in the repo right now. Check that the model names match (stg_production, stg_sales, stg_purchases, fct_variety_profitability), the test descriptions match what's in schema.yml, and the profitability findings reflect the verified numbers.

Open materials/pipeline-spec.md one more time. Compare the original requirements against what the pipeline delivers. Everything Mihai asked for -- profitability by variety, yield per variety, quarterly summary -- should be accounted for. The per-restaurant analysis is the one item that's explicitly deferred.

Step 5: Commit and push

The project is complete. Commit the work to Git with meaningful messages that describe what each piece does -- not "add files" but messages that tell the story of the pipeline's construction.

Stage and commit all work in the branzeria_carpati project. Use separate commits for logical groupings: the dbt project structure and source configuration, the staging models, the mart model, the test suite in schema.yml, and the README. Each commit message should describe what the piece does and why -- for example, "Add stg_production staging model with source-conforming columns" rather than "add model". After committing, push to GitHub.

Review the commit history after Claude finishes. The progression should be visible: project setup, then staging, then mart, then tests, then documentation. Someone reading the Git log should understand how the pipeline was built -- profiling first, then structure, then business logic, then quality gates.

The dbt project is the deliverable. Three staging models clean and standardize the raw sources. One mart model joins them and calculates profitability. The test suite in schema.yml declares what "correct" means -- unique batches, no null shepherds, six known varieties, and every shepherd traceable to a purchase record. The pipeline runs idempotently. The numbers are verified. Mihai has his answer.


✓ Check

✓ Check: The repository contains: dbt project with staging models, mart model, schema.yml with tests, and a commit history showing the progression from profiling through staging through mart through testing.

Project complete

Nice work. Ready for the next one?