Learn by Directing AI
Chat with Mihai PopescuAsk questions, get feedback, discuss the project

The Brief

Mihai Popescu runs a small artisan cheese operation in Sibiu, Romania. He makes six traditional varieties -- telemea, cascaval, branza de burduf, and others -- from local sheep and cow milk. Eight years of data sit in three spreadsheets maintained by different people: a production log tracking batches, milk input, cheese output, and aging dates. A sales record with customers, quantities, and prices. A milk purchase ledger with shepherd names, liters, fat content, and costs.

His accountant asks every quarter which cheese variety actually makes money after milk cost, aging time, and waste. Mihai cannot answer. The three spreadsheets don't connect. Production tracks batches. Sales track customers. There is no shared identifier linking a batch of telemea to the revenue it generated. The profitability question requires combining all three sources -- and the join between them is not straightforward.

Your Role

You're building a dbt project that connects Mihai's milk purchases to production batches to sales, producing a profitability model by cheese variety with automated quality tests.

This is your first project using a transformation framework. The SQL is the same kind you wrote in P1 and P2 -- staging models, mart tables, type casting, joins. What's new is that the SQL lives inside dbt models with declared dependencies, naming conventions, and automated tests. You don't write scripts that run top-to-bottom. You write models that dbt builds in dependency order.

You'll direct Claude Code the same way -- focused, sequential requests. The pipeline spec tells you what Mihai needs. How you break the work into tasks is your call.

What's New

Last time you extracted API data, designed a schema from source profiles, and built staging and mart layers in raw SQL scripts. You handled idempotency manually -- making sure re-running the pipeline didn't create duplicates.

This time, dbt handles idempotency for you. Models use CREATE OR REPLACE by default, so running the pipeline twice produces the same result without manual checks. The framework encodes a property you had to build yourself before.

The genuinely new piece is automated testing. dbt's built-in tests -- unique, not_null, accepted_values, relationships -- check structural properties of your data after every run. But structural correctness is not the same as business correctness. All tests can pass while the profitability numbers are wrong. Deciding which tests actually protect Mihai's numbers -- and which just confirm things the database already guarantees -- is the real decision in this project.

Tools

  • Python -- via your Miniconda de environment
  • DuckDB -- analytical database for all three sources
  • SQL -- transformation logic inside dbt models
  • dbt Core + dbt-duckdb adapter -- transformation framework (new this project). The unit that introduces dbt walks through setup.
  • Claude Code -- your AI agent, doing the implementation work
  • Git / GitHub -- version control

Materials

You'll receive:

  • Pipeline specification -- what to build, Mihai's requirements, verification targets
  • Three data source files -- production log, sales records, milk purchases
  • dbt project template -- project structure, dbt_project.yml, and profiles.yml configured for DuckDB
  • Verification checklist -- row counts, staging counts, profitability spot-checks
  • Project governance file -- CLAUDE.md with the full ticket breakdown
  • Ticket backlog -- work broken into sequenced tickets