Learn by Directing AI

The Brief

Siobhan Murray is Head of Operations at Verdant Packaging, a sustainable packaging manufacturer outside Cork, Ireland. Forty-five employees. They make compostable food containers, paper-based mailer bags, and biodegradable industrial wrapping. Growing fast -- EU packaging regulations are pushing clients away from plastic, and Verdant is one of the beneficiaries.

The data is spread across four systems in four formats. Production floor logs are in one system, sales in another, raw materials procurement in spreadsheets, and quality testing results come from the lab. Siobhan suspects the food container line is thin on margins because of material costs, but she cannot see the full picture. She wants to understand which product lines are actually profitable, track quality metrics, and spot problems before they show up in the end-of-month numbers.

Your Role

You're pulling together data from four sources in four formats and building something Siobhan can use to make operational decisions. The data arrives as Parquet, JSON, CSV, and CSV-from-PDF. Each format carries different risks. What worked for single-source CSV analysis needs to expand.

The metrics you build are not flat definitions. Production efficiency decomposes into components -- availability, speed, quality -- where changing one definition cascades to everything above it. You design the hierarchy, document the dependencies, and deliver an argument, not a list.

AI handles each format but makes different assumptions for each. It flattens nested JSON in ways that lose data. It infers schema types from the first few rows and misses inconsistencies later. Your verification work is format-specific now.

What's New

Last time, you framed Wei's business questions as testable hypotheses, chose the right statistical test for binary outcome data, and reported confidence intervals instead of point estimates. You specified constraints before computation and crossed from descriptive to inferential statistics.

This time, the data itself is the new challenge. Four formats, each with different risks. Metrics that exist in hierarchies -- one number that decomposes into three, where changing one changes the rest. Your findings need to be structured as a professional argument, not a chronological list. And you will use a second AI to review your first AI's analysis -- cross-model verification.

The hard part is not any single source. It is connecting four sources with different formats, freshness, and granularity into a coherent analysis, and then communicating that analysis as a structured argument.

Tools

Python 3.11+ (via Miniconda, "analytics" environment)
DuckDB
Polars (new -- for scale-appropriate data handling)
Jupyter Notebook
pandas
scipy.stats, statsmodels
matplotlib / seaborn
Metabase (via Docker -- continuing from previous projects)
Docker
Claude Code
Git / GitHub

Materials

Production logs -- Parquet file from the manufacturing execution system, about 50,000 rows of shift-level production data.
Sales data -- JSON export from the e-commerce API with nested customer objects and line item arrays. About 12,000 orders.
Procurement records -- CSV from finance spreadsheets with monthly supplier costs, lead times, and quality grades.
Quality results -- CSV derived from PDF lab reports with batch-level test results, pass/fail, and moisture readings.
Data dictionary -- describes all four sources, their formats, date ranges, and freshness status.
Metric hierarchy template -- introduces OEE decomposition, leading/lagging indicators, and hierarchy documentation.
CLAUDE.md -- project governance file with client context, work breakdown, and verification targets.