Step 1: Siobhan's messages
Open the chat with Siobhan Murray. She's Head of Operations at Verdant Packaging, a sustainable packaging manufacturer outside Cork, Ireland. Three Slack messages, short and direct.
She makes compostable food containers, paper-based mailer bags, and biodegradable industrial wrapping. Forty-five employees. Growing fast because EU packaging regulations are pushing everyone away from plastic. The data is spread across four systems in four formats, and she's been making decisions on gut feeling.
Read all three messages. She needs help pulling together production and sales data. She suspects the food container line has thin margins but cannot see the full picture. She warns: "CSV, JSON, Excel, and PDF lab reports."
Step 2: Discover what Siobhan has
Message Siobhan to understand her data situation. She is a primary client -- she shares what you ask for, but she will not volunteer everything. No suggested message buttons at this point. You initiate the conversation.
Ask about her data sources. Through conversation, you discover the four systems:
- Production logs -- Parquet files from their manufacturing execution system. Shift-level data for all three production lines.
- Sales data -- JSON from their e-commerce API. Orders with nested customer objects and line item arrays.
- Procurement records -- CSV exports from finance spreadsheets. Monthly supplier costs and lead times.
- Quality test results -- CSV derived from PDF lab reports. Batch-level quality metrics.
Siobhan is conversational but efficient. Every message has a point. She uses "grand" in the Irish sense -- "that's grand, I'll get you the data exports" means "okay, acceptable."
Ask about her priorities. She wants three things: product-line profitability (are we making money on the right products?), quality metric tracking (are we getting better or worse?), and early warning indicators (spot problems before end-of-month surprises).
Note what she does not mention. She has hidden constraints -- penalty clauses on supermarket contracts, a single supplier for PLA resin, a separate waste tracking spreadsheet, a Monday morning quality problem. These only surface if you ask the right questions later.
Step 3: Project setup
Open a terminal and start Claude Code:
cd ~/dev
claude
Paste this prompt:
Create the folder ~/dev/analytics/p7. Download the project materials from https://learnbydirectingai.dev/materials/analytics/p7/materials.zip and extract them into that folder. Read CLAUDE.md -- it's the project governance file.
Claude creates the folder, downloads the materials, and reads CLAUDE.md. That file describes Siobhan's situation, the deliverables, the tech stack, and the work breakdown. Once Claude confirms it has read CLAUDE.md, you are set up.
Step 4: The data dictionary
Open materials/data-dictionary.md. It describes all four data sources: what each contains, what format it is in, how frequently it is updated, and what date ranges it covers.
Four sections, one per source. Notice the differences between them. The production logs are Parquet with 18 months of shift-level data, updated this week. The sales data is JSON with nested structures, covering the same 18 months -- but last updated three weeks ago. That gap matters. An analysis built on sales data that is three weeks stale produces conclusions about a business that has already moved on.
The procurement records are CSV with 24 months of monthly data, updated this month. The quality results are CSV derived from PDF lab reports, 18 months, updated this week.
Each format carries different risks. JSON has nested structures that flatten inconsistently. Parquet carries schema metadata that can be stale. CSV has its own quirks -- encoding, delimiters, quoted fields. These are not the same validation concerns you had with single-source CSV data in previous projects.
Step 5: Identify the data files
Direct AI to list the files in the materials directory:
List all files in materials/ and show me the file extension and size for each data file.
You should see six data and reference files: production-logs.parquet, sales-data.json, procurement-records.csv, quality-results.csv, data-dictionary.md, and metric-hierarchy-template.md, plus the CLAUDE.md governance file.
Three different file formats for data files. Until now, every project used CSV. This project has Parquet, JSON, and CSV together. Each format stores data differently, which means each format fails differently -- and AI handles each one with different assumptions you will need to verify.
Step 6: Document priorities from the conversation
Before moving on, document what you know from the Siobhan conversation. In your notebook or a markdown file, record:
- Client priorities -- product-line profitability, quality metric tracking, early warning indicators
- Data sources -- production logs (Parquet), sales data (JSON), procurement records (CSV), quality results (CSV)
- Open questions -- anything Siobhan mentioned but did not fully explain, and anything you suspect she has not mentioned yet
This documentation becomes part of your project context. When you direct AI later, including the client's priorities in your prompt changes what AI produces. Context curation starts here -- deciding what matters enough to carry forward into every session.
✓ Check: Four data files in three formats; data dictionary covers all sources; priorities documented