Step 1: Identify the sources for profitability
Siobhan's core question: "I can't see whether we're making money on the right products." To answer it, you need data from all four sources.
- Revenue by product line -- from the sales data (JSON). Order quantities and unit prices per product type.
- Material costs -- from the procurement records (CSV). Supplier costs per material type. PLA resin (food containers) is the most expensive. Paper stock (mailer bags) is cheapest.
- Production time -- from the production logs (Parquet). How long each line runs, how much it produces, what capacity it uses.
- Waste costs -- from the quality results (CSV). Batch failures mean disposed or reprocessed units. Each failure has a cost.
This is the first time you are joining data across four sources with different formats, freshness, and granularity. The choice of which source to use for which part of the analysis is itself an analytical decision. The production logs are current. The sales data is three weeks stale. The profitability calculation will combine both -- and you need to decide whether the staleness undermines the conclusion or whether the gap is small enough to accept.
Step 2: Join production and sales data
Start by connecting output volumes to revenue. The production logs tell you how much was produced. The sales data tells you what was sold and at what price.
Direct AI to join them:
Join the production logs and sales data on product_type. For each product line (food_container, mailer_bag, industrial_wrap), show total units produced, total units sold, and total revenue. Use the properly flattened sales data from Unit 2.
Check the join key. "Product type" must mean the same thing in both systems. If the production logs use "food_container" and the sales data uses "Food Container" or "FC", the join fails silently -- rows that should match do not. Verify that product identifiers are consistent across systems before trusting the join.
Step 3: Add procurement costs
Now add material costs per product line. Each product uses different raw materials at different costs.
Direct AI to calculate material cost per unit:
From the procurement records, calculate the average material cost per unit for each product type. Food containers use PLA resin. Mailer bags use paper stock. Industrial wrapping uses recycled content. Show the cost breakdown.
PLA resin is the expensive one -- EUR 3.20 to 4.50 per kilogram, sourced from a single supplier (ChemPlas GmbH) with lead times of 35 to 50 days. Paper stock runs EUR 0.80 to 1.20 from multiple suppliers. The material cost difference between product lines is significant and will show up in the profitability numbers.
Step 4: Discover the hidden waste costs
You have revenue, material costs, and production time. But there is a cost component missing: waste. Batch failures mean units are either disposed of or reprocessed, and both have costs.
Ask Siobhan about quality-related costs. She reveals the hidden constraint: "Oh, the waste tracking -- Darren in quality keeps that in a separate spreadsheet. The reprocessing costs aren't in the production system." She adds: "I probably should have mentioned that earlier -- the reprocessing costs are significant, especially on the food container line. Darren's numbers would change the picture."
The quality results data includes pass/fail status and some rows with "reprocessed" in the notes field. Use these to estimate waste costs per product line:
From the quality results, calculate the failure rate and reprocessing rate per product line. For each failed batch, estimate the waste cost as the material cost of the units in that batch. Show waste costs per product line.
The food container line should show the highest waste costs -- driven by both a higher failure rate and the higher cost of PLA resin per unit.
Now calculate per-product-line profitability: revenue minus material costs minus production time allocation minus waste costs.
Calculate profitability per product line: revenue minus material costs minus estimated production time costs minus waste costs. Show the breakdown for food containers, mailer bags, and industrial wrapping.
Step 5: Check for confounding variables
Look at the profitability results. AI may present a correlation between production volume and waste rate. Before accepting it, check: is high waste causing low profitability, or is the food container line's expensive PLA resin the real driver?
AI reports correlation without flagging the confound. For every correlation AI presents, ask: "What else could explain this?" The food container line has both the highest volume and the highest waste rate -- but the waste might be driven by PLA moisture problems rather than volume. The correlation is real. The causal story requires more investigation.
Direct AI to check:
Is the correlation between production volume and waste rate on the food container line confounded by PLA moisture content? Show the failure rate when PLA moisture is above 4% versus below 4%, controlling for production volume.
The answer should show that PLA moisture is the stronger predictor of batch failures than production volume. The correlation between volume and waste exists, but the mechanism runs through material quality, not scale.
Step 6: Cross-model review
This is the first time you use cross-model review -- directing a second AI to check the first AI's analysis.
The idea is straightforward: different models have different failure patterns. A finding both models agree on is more trustworthy than one only a single model produces. A point where they diverge tells you where the analysis needs more scrutiny.
Start a fresh AI session (or use a different model if available). Provide the metric definitions, data sources, join logic, and profitability results. Ask the second AI to review:
Review this product-line profitability analysis. Three product lines: food containers, mailer bags, industrial wrapping. Revenue comes from sales data (JSON). Material costs from procurement records (CSV). Production time from production logs (Parquet). Waste costs from quality results (CSV). [Include the numbers from your analysis.] Do you agree with the profitability ranking? Are there cost components or confounds that this analysis may have missed?
Compare the two assessments. Note where they agree -- both should identify the food container line as least profitable when waste costs are included. Note where they diverge -- they may attribute the profitability gap to different factors (material costs versus production time allocation versus waste), or one may flag a confound the other missed.
Document the cross-model review results in your working notes. Include: points of agreement, points of divergence, and how you resolved the divergences. This becomes part of your project memory -- future AI sessions can reference the cross-model findings.
✓ Check: Profitability shows three product lines with all cost components; cross-model review completed with agreements and divergences noted