Learn by Directing AI
All materials

data-dictionary.md

Data Dictionary: Verdant Packaging

Four data sources from four systems, each in a different format.


1. Production Logs (Parquet)

File: production-logs.parquet Source: Manufacturing Execution System (MES) Format: Apache Parquet (columnar storage with schema metadata) Date range: 18 months (approximately January 2025 through June 2026) Last updated: This week Approximate rows: 50,000

Column Type Description
timestamp datetime Production shift start time
production_line_id string LINE-A (food containers), LINE-B (mailer bags), LINE-C (industrial wrapping)
product_type string food_container, mailer_bag, industrial_wrap
planned_time_minutes integer Scheduled production time for the shift (typically 420-480)
actual_time_minutes integer Actual production time (accounts for downtime)
units_produced integer Total units produced during the shift
rated_capacity integer Maximum possible output at the line's rated speed
downtime_minutes integer Unplanned downtime during the shift
downtime_reason string none, maintenance, material_shortage, calibration, equipment_failure

Notes: Three production lines run two shifts per day. LINE-A produces food containers (PLA-based, highest quality requirements). LINE-B produces mailer bags (paper-based, highest volume). LINE-C produces industrial wrapping (recycled content).


2. Sales Data (JSON)

File: sales-data.json Source: E-commerce platform API export Format: JSON array (nested structure with customer objects and line item arrays) Date range: 18 months (approximately January 2025 through June 2026) Last updated: Three weeks ago (the API export runs monthly and the last export was delayed) Approximate records: 12,000 orders

Top-level fields

Field Type Description
order_id string Unique order identifier (sequential)
order_date string (ISO date) Date the order was placed
customer object Customer details (see nested structure below)
line_items array Products ordered (see nested structure below)
delivery_status string delivered, delayed, partial

Customer object (nested)

Field Type Description
customer_id string Unique customer identifier
customer_type string food_producer, ecommerce_brand, supermarket_chain
channel string direct, distributor, online

Line items array (nested, 1-3 items per order)

Field Type Description
product_type string food_container, mailer_bag, industrial_wrap
quantity integer Units ordered
unit_price float Price per unit in EUR
total float Line item total (quantity x unit_price)

Notes: The nested structure means each order can contain multiple product types. Supermarket chain orders tend to be larger volumes with tighter delivery requirements. The three-week data lag means recent production data has no matching sales data.


3. Procurement Records (CSV)

File: procurement-records.csv Source: Finance team spreadsheets (exported monthly) Format: CSV Date range: 24 months (approximately July 2024 through June 2026) Last updated: This month Approximate rows: 500

Column Type Description
month string (YYYY-MM) Month of the purchase
supplier string Supplier company name
material_type string pla_resin, paper_stock, recycled_content, adhesive, ink, packaging_film
quantity_kg integer Kilograms purchased
unit_cost_eur float Cost per kilogram in EUR
total_cost_eur float Total purchase cost (quantity_kg x unit_cost_eur)
lead_time_days integer Days from order to delivery
quality_grade string A (excellent), B (acceptable), C (below standard)

Notes: PLA resin (used for food containers) comes from a single supplier. Lead times for PLA are significantly longer than other materials. Quality grade C deliveries have occasionally been linked to production quality issues downstream.


4. Quality Results (CSV, derived from PDF lab reports)

File: quality-results.csv Source: Quality lab reports (originally PDF, transcribed to CSV by the quality manager) Format: CSV Date range: 18 months (approximately January 2025 through June 2026) Last updated: This week Approximate rows: 3,000

Column Type Description
batch_id string Unique batch identifier (sequential)
production_date date Date the batch was produced
production_line_id string LINE-A, LINE-B, LINE-C
product_type string food_container, mailer_bag, industrial_wrap
test_type string tensile_strength, moisture_barrier, compostability, weight_tolerance, seal_integrity
result_value float Numeric test result (units vary by test type)
pass_fail string pass, fail
pla_moisture_pct float PLA moisture content percentage (1.5-6.0, primarily relevant for food containers)
notes string Usually empty. Occasional "reprocessed" or "batch disposed" entries

Notes: Each batch is tested for multiple quality criteria. The pla_moisture_pct field is measured for all batches but is most critical for food containers (LINE-A), where high moisture content affects seal integrity. Batches marked "reprocessed" or "batch disposed" in the notes field represent waste costs that are tracked separately from the production system.


Data Freshness

Source Last updated Freshness status
Production logs This week Current
Sales data 3 weeks ago Stale -- recent production has no matching sales
Procurement records This month Current
Quality results This week Current

The sales data lag means any analysis connecting recent production to sales revenue will have a gap. This should be noted in findings that depend on the production-to-sales relationship.