Data Dictionary: Verdant Packaging
Four data sources from four systems, each in a different format.
1. Production Logs (Parquet)
File: production-logs.parquet
Source: Manufacturing Execution System (MES)
Format: Apache Parquet (columnar storage with schema metadata)
Date range: 18 months (approximately January 2025 through June 2026)
Last updated: This week
Approximate rows: 50,000
| Column | Type | Description |
|---|---|---|
| timestamp | datetime | Production shift start time |
| production_line_id | string | LINE-A (food containers), LINE-B (mailer bags), LINE-C (industrial wrapping) |
| product_type | string | food_container, mailer_bag, industrial_wrap |
| planned_time_minutes | integer | Scheduled production time for the shift (typically 420-480) |
| actual_time_minutes | integer | Actual production time (accounts for downtime) |
| units_produced | integer | Total units produced during the shift |
| rated_capacity | integer | Maximum possible output at the line's rated speed |
| downtime_minutes | integer | Unplanned downtime during the shift |
| downtime_reason | string | none, maintenance, material_shortage, calibration, equipment_failure |
Notes: Three production lines run two shifts per day. LINE-A produces food containers (PLA-based, highest quality requirements). LINE-B produces mailer bags (paper-based, highest volume). LINE-C produces industrial wrapping (recycled content).
2. Sales Data (JSON)
File: sales-data.json
Source: E-commerce platform API export
Format: JSON array (nested structure with customer objects and line item arrays)
Date range: 18 months (approximately January 2025 through June 2026)
Last updated: Three weeks ago (the API export runs monthly and the last export was delayed)
Approximate records: 12,000 orders
Top-level fields
| Field | Type | Description |
|---|---|---|
| order_id | string | Unique order identifier (sequential) |
| order_date | string (ISO date) | Date the order was placed |
| customer | object | Customer details (see nested structure below) |
| line_items | array | Products ordered (see nested structure below) |
| delivery_status | string | delivered, delayed, partial |
Customer object (nested)
| Field | Type | Description |
|---|---|---|
| customer_id | string | Unique customer identifier |
| customer_type | string | food_producer, ecommerce_brand, supermarket_chain |
| channel | string | direct, distributor, online |
Line items array (nested, 1-3 items per order)
| Field | Type | Description |
|---|---|---|
| product_type | string | food_container, mailer_bag, industrial_wrap |
| quantity | integer | Units ordered |
| unit_price | float | Price per unit in EUR |
| total | float | Line item total (quantity x unit_price) |
Notes: The nested structure means each order can contain multiple product types. Supermarket chain orders tend to be larger volumes with tighter delivery requirements. The three-week data lag means recent production data has no matching sales data.
3. Procurement Records (CSV)
File: procurement-records.csv
Source: Finance team spreadsheets (exported monthly)
Format: CSV
Date range: 24 months (approximately July 2024 through June 2026)
Last updated: This month
Approximate rows: 500
| Column | Type | Description |
|---|---|---|
| month | string (YYYY-MM) | Month of the purchase |
| supplier | string | Supplier company name |
| material_type | string | pla_resin, paper_stock, recycled_content, adhesive, ink, packaging_film |
| quantity_kg | integer | Kilograms purchased |
| unit_cost_eur | float | Cost per kilogram in EUR |
| total_cost_eur | float | Total purchase cost (quantity_kg x unit_cost_eur) |
| lead_time_days | integer | Days from order to delivery |
| quality_grade | string | A (excellent), B (acceptable), C (below standard) |
Notes: PLA resin (used for food containers) comes from a single supplier. Lead times for PLA are significantly longer than other materials. Quality grade C deliveries have occasionally been linked to production quality issues downstream.
4. Quality Results (CSV, derived from PDF lab reports)
File: quality-results.csv
Source: Quality lab reports (originally PDF, transcribed to CSV by the quality manager)
Format: CSV
Date range: 18 months (approximately January 2025 through June 2026)
Last updated: This week
Approximate rows: 3,000
| Column | Type | Description |
|---|---|---|
| batch_id | string | Unique batch identifier (sequential) |
| production_date | date | Date the batch was produced |
| production_line_id | string | LINE-A, LINE-B, LINE-C |
| product_type | string | food_container, mailer_bag, industrial_wrap |
| test_type | string | tensile_strength, moisture_barrier, compostability, weight_tolerance, seal_integrity |
| result_value | float | Numeric test result (units vary by test type) |
| pass_fail | string | pass, fail |
| pla_moisture_pct | float | PLA moisture content percentage (1.5-6.0, primarily relevant for food containers) |
| notes | string | Usually empty. Occasional "reprocessed" or "batch disposed" entries |
Notes: Each batch is tested for multiple quality criteria. The pla_moisture_pct field is measured for all batches but is most critical for food containers (LINE-A), where high moisture content affects seal integrity. Batches marked "reprocessed" or "batch disposed" in the notes field represent waste costs that are tracked separately from the production system.
Data Freshness
| Source | Last updated | Freshness status |
|---|---|---|
| Production logs | This week | Current |
| Sales data | 3 weeks ago | Stale -- recent production has no matching sales |
| Procurement records | This month | Current |
| Quality results | This week | Current |
The sales data lag means any analysis connecting recent production to sales revenue will have a gap. This should be noted in findings that depend on the production-to-sales relationship.