Pipeline Specification: Textile Dyeing Quality Analysis
Client
Roberto Hernandez, Production Manager, Textiles del Pacifico S.A. de C.V., San Salvador, El Salvador.
Requirements
- Show which variables correlate with higher color match scores -- temperature, humidity, chemical concentration, operator, time of day
- Compare quality fairly across the three production lines, even though they have different equipment
- Track re-dye rates over time -- is it getting better or worse, and on which lines?
- Filter by fabric type and dye color -- some colors are harder to match
- Update automatically as new batch data comes in daily
Data sources
- Batch production data: CSV export from production control system. ~900 batches per day across 3 lines. 14 months of historical data available. Daily export at end of shift.
- Columns: batch_id, line_number, fabric_type, dye_formula, temperature, humidity, chemical_concentration, color_match_score, pass_fail, operator_id, timestamp
Schema design
Grain decision: What is one row in your fact table?
Staging layer: What source-conforming transformations are needed? (type casts, unit conversions, NULL handling)
Intermediate layer: What business logic transformations? (score normalization, window functions, aggregations)
Mart layer: What final tables serve the analysis? (daily quality by line? operator performance? fabric type comparison?)
Layer architecture
Staging models:
Intermediate models:
Mart models:
Transformation logic
Unit conversions needed:
Score normalization approach:
Window functions needed:
Jinja macros to create:
Quality testing strategy
dbt tests (row-level):
Soda Core checks (batch-level):
Threshold decisions:
Verification targets
See materials/CLAUDE.md for detailed verification targets. Key numbers:
- Overall re-dye rate: approximately 8-12%
- Line 1 quality: worse than Lines 2-3
- OP-001 and OP-003: better pass rates than other operators
- All staging temperatures: Celsius (65-100 range)
- dbt build reproducibility: identical results on consecutive runs