Pipeline Specification: Textile Dyeing Quality Analysis

Client

Roberto Hernandez, Production Manager, Textiles del Pacifico S.A. de C.V., San Salvador, El Salvador.

Show which variables correlate with higher color match scores -- temperature, humidity, chemical concentration, operator, time of day
Compare quality fairly across the three production lines, even though they have different equipment
Track re-dye rates over time -- is it getting better or worse, and on which lines?
Filter by fabric type and dye color -- some colors are harder to match
Update automatically as new batch data comes in daily

Batch production data: CSV export from production control system. ~900 batches per day across 3 lines. 14 months of historical data available. Daily export at end of shift.
Columns: batch_id, line_number, fabric_type, dye_formula, temperature, humidity, chemical_concentration, color_match_score, pass_fail, operator_id, timestamp

Grain decision: What is one row in your fact table?

Staging layer: What source-conforming transformations are needed? (type casts, unit conversions, NULL handling)

Intermediate layer: What business logic transformations? (score normalization, window functions, aggregations)

Mart layer: What final tables serve the analysis? (daily quality by line? operator performance? fabric type comparison?)

Staging models:

Intermediate models:

Mart models:

Unit conversions needed:

Score normalization approach:

Window functions needed:

Jinja macros to create:

dbt tests (row-level):

Soda Core checks (batch-level):

Threshold decisions:

See materials/CLAUDE.md for detailed verification targets. Key numbers: