Analytics P7: Systematic Quality Assessment -- Verdant Packaging
Client
Siobhan Murray, Head of Operations at Verdant Packaging. Sustainable packaging manufacturer outside Cork, Ireland. 45 employees. Makes compostable food containers, paper-based mailer bags, and biodegradable industrial wrapping.
What you're building
A multi-source quality assessment with metric hierarchies, cross-source profitability analysis, a leading/lagging indicator dashboard, and an analytical narrative. Siobhan needs to understand which product lines are actually profitable, track quality metrics, and spot problems before they happen. Data arrives in four formats from four systems.
Tech stack
- Python 3.11+ (Miniconda, "analytics" environment)
- DuckDB (primary data engine)
- Polars (for scale-appropriate data handling on larger datasets)
- Jupyter Notebook
- pandas
- scipy.stats, statsmodels
- matplotlib / seaborn
- Metabase (via Docker)
- Git / GitHub
File structure
~/dev/analytics/p7/
materials/
CLAUDE.md <- this file
data-dictionary.md <- describes all four data sources
metric-hierarchy-template.md <- OEE decomposition template
production-logs.parquet <- production data (~50,000 rows)
sales-data.json <- sales API export (~12,000 records)
procurement-records.csv <- monthly supplier records (~500 rows)
quality-results.csv <- batch quality test results (~3,000 rows)
notebooks/ <- Jupyter analysis notebooks
output/ <- analysis outputs, dashboard exports
Data sources
- Production logs (Parquet) -- timestamps, production line, output, downtime, capacity
- Sales data (JSON) -- nested orders with customer info and line items
- Procurement records (CSV) -- monthly supplier costs, lead times, quality grades
- Quality results (CSV) -- batch-level quality test results with pass/fail
Key rules
- Always deduplicate before aggregating
- Revenue means total from sales-data.json line_items
- OEE = Availability x Performance Rate x Quality Rate (define each separately)
- PLA moisture > 4% correlates with batch failures -- this is a leading indicator
- The sales data is 3 weeks stale -- note this in any analysis that uses it
- When joining across sources, verify product type identifiers match
Work breakdown
- Unit 1: Client discovery, project setup, data source identification
- Unit 2: Format-specific profiling and validation (Parquet, JSON, CSV)
- Unit 3: OEE metric hierarchy design with leading/lagging indicators
- Unit 4: Cross-source profitability analysis with cross-model review
- Unit 5: Metabase dashboard with OEE hierarchy and leading indicators
- Unit 6: Analytical narrative (professional argument), power analysis, project close
Verification targets
- All four data sources loaded and profiled with format-specific validation
- OEE hierarchy with three defined components (Availability, Performance Rate, Quality Rate)
- At least 3 leading and 3 lagging indicators identified
- Product-line profitability across all three lines with full cost breakdown
- Cross-model review completed with documented agreements and divergences
- Metabase dashboard with OEE drill-down, profitability, and indicator panels
- Analytical narrative structured as conclusion-first argument with alternative explanations
Commit convention
Commit after each unit is complete. Descriptive messages: "Unit 2: format-specific profiling complete" not "update notebook."