Step 1: Carmen's Slack thread
Carmen Reyes runs the commercial side at Bona Lluna Cervesa, a craft brewery outside Barcelona. Twenty-five people, twelve beer styles, three sales channels that don't talk to each other.
Open the chat with Carmen. Read her messages. Five requirements:
- Bring all three data sources together so the numbers match
- Build a dashboard she checks every Monday morning
- Show beer style performance by channel
- Add filters for time period, channel, and beer style
- Make it usable for her sales manager
Carmen is casual and direct. She sends short Slack messages in rapid succession rather than one long block. She'll go on tangents about brewing if you ask about product names.
Step 2: Project setup
Open a terminal and start Claude Code:
cd ~/dev
claude
Paste this prompt:
Create the folder ~/dev/analytics/p3. Download the project materials from https://learnbydirectingai.dev/materials/analytics/p3/materials.zip and extract them into that folder. Read CLAUDE.md -- it's the project governance file. If anything needs admin access, tell me what to run in a separate terminal.
Claude creates the folder, downloads the materials, and reads CLAUDE.md. That file describes the client, the deliverables, the tech stack, and the work breakdown. Once Claude confirms it has read CLAUDE.md, you are set up.
Step 3: Docker and Metabase
This project uses Metabase, a BI dashboard platform. It runs via Docker, which is a tool that packages applications so they run the same way on any machine.
Direct AI to start Metabase:
Start Metabase using the docker-compose.yml in the materials folder. Verify it's running at localhost:3000.
If Docker needs admin access, Claude will tell you what to paste in a separate terminal. Once Metabase is running, open http://localhost:3000 in your browser. You should see the Metabase setup wizard.
Complete the initial setup: choose a language, create an admin account. Skip the database connection for now -- you'll connect DuckDB later.
Metabase is different from the matplotlib charts you've built before. Instead of writing Python code that generates a chart, you configure a platform that serves an interactive dashboard. The analytical concerns are the same -- chart type, labels, data accuracy. The medium changes.
Step 4: The product catalog
Open materials/product_catalog.csv. This is Carmen's authoritative product list: canonical names, categories, ABV, costs per liter, and status.
Twelve active beer styles across six categories, plus two discontinued styles (Red Ale and Pilsner) that still appear in historical sales data. Three styles are seasonal -- they only brew in specific quarters.
The names in this file are canonical. "IPA" in the product catalog is the standard name. The POS system calls it "IPA" too, but the invoicing system calls it "BL IPA x24" and the online shop calls it "Bona Lluna IPA 330ml". Every other system maps to this catalog.
Notice the cost_per_liter_eur column. You'll need this when you define profitability.
Step 5: Profile the three sources
Direct AI to load all three data files into DuckDB and profile each one separately. Don't ask for one big analysis -- break it into focused requests.
Start with the taproom:
Load materials/taproom_pos.csv into DuckDB and profile it -- row count, column names and types, date range, value ranges for numeric columns, null counts, and distinct values for the product_name column.
Then do the same for materials/distribution_invoices.csv and materials/online_orders.csv.
The sequence matters. Profiling each source independently lets you compare them afterward. Asking AI to profile all three at once produces a summary where source-specific details get lost. AI commonly misjudges what level of detail you need -- a focused request for one source at a time keeps the output manageable.
Step 6: Compare the profiles
Now you have three profiles. Compare them.
The product names are different in every system. The taproom uses short names: "IPA", "Pale Ale". Distribution uses a branded pack format: "BL IPA x24", "BL Pale Ale x12". Online uses the full brand name with size: "Bona Lluna IPA 330ml", "Bona Lluna Pale Ale 6-Pack".
The date formats are different. The taproom uses YYYY-MM-DD. Distribution uses DD-Mon-YYYY (like "15-Jan-2024"). Online uses MM/DD/YYYY (US format from WooCommerce).
Count the distinct product names in each source and compare against the product catalog. Each source will have more distinct names than the twelve active styles in the catalog -- because of naming variants, size variants, and discontinued products still appearing in historical data.
The distribution data has a payment_type column with values "standard" and "consignment." The online data has a currency column with values "EUR" and "GBP." Both of these will matter when you define revenue.
Check: How many distinct product names appear in each source? How do those counts compare to the 12 styles in the product catalog?