Analytics P7: Systematic Quality Assessment -- Verdant Packaging

Client

Siobhan Murray, Head of Operations at Verdant Packaging. Sustainable packaging manufacturer outside Cork, Ireland. 45 employees. Makes compostable food containers, paper-based mailer bags, and biodegradable industrial wrapping.

What you're building

A multi-source quality assessment with metric hierarchies, cross-source profitability analysis, a leading/lagging indicator dashboard, and an analytical narrative. Siobhan needs to understand which product lines are actually profitable, track quality metrics, and spot problems before they happen. Data arrives in four formats from four systems.

Tech stack

Python 3.11+ (Miniconda, "analytics" environment)
DuckDB (primary data engine)
Polars (for scale-appropriate data handling on larger datasets)
Jupyter Notebook
pandas
scipy.stats, statsmodels
matplotlib / seaborn
Metabase (via Docker)
Git / GitHub

File structure

~/dev/analytics/p7/
  materials/
    CLAUDE.md                    <- this file
    data-dictionary.md           <- describes all four data sources
    metric-hierarchy-template.md <- OEE decomposition template
    production-logs.parquet      <- production data (~50,000 rows)
    sales-data.json              <- sales API export (~12,000 records)
    procurement-records.csv      <- monthly supplier records (~500 rows)
    quality-results.csv          <- batch quality test results (~3,000 rows)
  notebooks/                     <- Jupyter analysis notebooks
  output/                        <- analysis outputs, dashboard exports

Data sources

Production logs (Parquet) -- timestamps, production line, output, downtime, capacity
Sales data (JSON) -- nested orders with customer info and line items
Procurement records (CSV) -- monthly supplier costs, lead times, quality grades
Quality results (CSV) -- batch-level quality test results with pass/fail

Key rules

Always deduplicate before aggregating
Revenue means total from sales-data.json line_items
OEE = Availability x Performance Rate x Quality Rate (define each separately)
PLA moisture > 4% correlates with batch failures -- this is a leading indicator
The sales data is 3 weeks stale -- note this in any analysis that uses it
When joining across sources, verify product type identifiers match

Work breakdown

Unit 1: Client discovery, project setup, data source identification
Unit 2: Format-specific profiling and validation (Parquet, JSON, CSV)
Unit 3: OEE metric hierarchy design with leading/lagging indicators
Unit 4: Cross-source profitability analysis with cross-model review
Unit 5: Metabase dashboard with OEE hierarchy and leading indicators
Unit 6: Analytical narrative (professional argument), power analysis, project close

Verification targets

All four data sources loaded and profiled with format-specific validation
OEE hierarchy with three defined components (Availability, Performance Rate, Quality Rate)
At least 3 leading and 3 lagging indicators identified
Product-line profitability across all three lines with full cost breakdown
Cross-model review completed with documented agreements and divergences
Metabase dashboard with OEE drill-down, profitability, and indicator panels
Analytical narrative structured as conclusion-first argument with alternative explanations

Commit convention

Commit after each unit is complete. Descriptive messages: "Unit 2: format-specific profiling complete" not "update notebook."