Learn by Directing AI
Unit 1

The brief

Step 1: Naledi's email

Naledi Khumalo runs Khumalo & Grain, an eighteen-person furniture workshop in Cape Town. She builds handcrafted hardwood furniture — dining tables, shelving units, desks, custom pieces — and sells through three channels: retail partners, direct commissions, and online. Revenue is up, but she can't pin down which channel is actually making her money. Retailers take 35-45%. Direct commissions eat her time. Online has shipping costs she hasn't tracked properly.

Her bookkeeper put together a year of sales data and a guide to what all the columns mean. Naledi wants someone to go through it and confirm what she suspects — or tell her where she's wrong.

Read her email. Four questions, one dataset, one deadline. Everything you compute in this project connects back to what Naledi needs.

Step 2: Project setup

Open a terminal and start Claude Code:

cd ~/dev
claude

Paste this prompt:

Set up my project:
1. Create ~/dev/analytics/p1
2. Download the project materials from https://learnbydirectingai.dev/materials/analytics/p1/materials.zip and extract them into that folder
3. Read CLAUDE.md — it's the project governance file

Claude will create the folder, download and extract the materials, and read through CLAUDE.md. That file describes the client, the deliverables, the tech stack, and the ticket list for the entire project. It is the project governance file — every ticket, every verification target, every file reference points back to it.

Once Claude confirms it has read CLAUDE.md, you are set up.

Step 3: The data dictionary

Open materials/data-dictionary.md. This is the column contract — it tells you exactly what each column means, what values are allowed, and what the constraints are.

Twelve columns. The ones that matter most for this project:

Column What it means
order_date When the order was placed (twelve months, January-December 2024)
product_type Six categories: dining table, shelving, desk, custom, side table, bench
channel Three values: retail, commission, online
sale_amount Sale price in ZAR. Negative values are refunds.
retailer_commission_rate 35%, 40%, or 45% — retail channel only, NULL otherwise
material_cost Raw material cost per piece
production_days Workshop days to build the piece. Dining tables: 16-20 days. Shelving: 5-6.
delivery_cost Varies by region. R0 for in-store pickup. Up to R3,500 for national delivery.

Pay attention to sale_amount. Negative values represent refunds — 25 rows out of 845, roughly 3% of the dataset. That detail will matter when you compute revenue later. If you sum the entire column without filtering, you get a number that includes refund deductions. The metric definitions say revenue excludes refunds. The difference between those two numbers is exactly the kind of thing AI will get wrong if nobody checks.

The data dictionary exists because column names are ambiguous. sale_amount could mean gross, net, or something else entirely. production_days could mean calendar days or working days. The dictionary resolves that ambiguity. Reading it before querying the data is not a formality — it is how you avoid computing the wrong thing confidently.

Step 4: The analysis spec

Open materials/analysis-spec.md. This is the analytical plan — it maps Naledi's four business questions to specific metrics, chart types, and verification targets.

Four questions:

  1. Which sales channel is most profitable after all costs? Metric: channel profit margin. Chart: bar chart of net revenue by channel, grouped bar chart of profit margin by channel and product type.
  2. Which product types generate the most revenue per workshop-week? Metric: revenue per workshop-week. Chart: bar chart by product type.
  3. Are there channel-product patterns? Cross-tabulation of product types by channel.
  4. What is the monthly revenue trend? Metric: monthly net revenue. Chart: line chart, twelve data points.

The spec also includes verification targets — known-correct values for every metric. When AI computes a number, you check it against these targets. If the numbers do not match, the computation is wrong. The targets are not suggestions. They are the answer key.

Notice the structure: a question, a metric that answers it, a chart that communicates it, and a target that verifies it. That sequence — question, metric, chart, verification — is the analytical workflow. This project walks through it once, start to finish.

Step 5: Metric definitions

Open materials/metric-definitions.md. This file defines precisely how each metric is computed.

Three definitions matter:

Net channel revenueSUM(sale_amount) WHERE sale_amount > 0, grouped by channel. Revenue excludes refund rows. Including refunds understates revenue and misrepresents channel performance.

Channel profit margin — the percentage of revenue retained after retailer commissions, material costs, and delivery costs. Retailers take 35-45%, so the retail channel has the highest gross revenue but the lowest margin. Commission and online channels have no retailer cut.

Revenue per workshop-week — revenue normalized by production time. SUM(sale_amount WHERE sale_amount > 0) / (SUM(production_days) / 5), grouped by product type. A dining table that sells for R25,000 but takes four weeks to build generates less revenue per workshop-week than shelving that takes one week. This is the metric Naledi has never seen — the one that changes how she thinks about what to build.

These definitions are decisions someone already made. "Revenue" is not a self-evident concept. It is a specification: which rows count, which do not, what gets subtracted. When AI computes SUM(sale_amount) without reading the definition, it makes its own decision — and that decision may not match what the business means.

✓ Check

Check: According to the data dictionary, what does a negative value in the sale_amount column represent? (Answer: a refund.) According to the metric definitions, does "channel revenue" include or exclude refunds? (Answer: excludes refunds.)