Learn by Directing AI
All materials

pipeline-spec-template.md

Pipeline Specification: Textile Dyeing Quality Analysis

Client

Roberto Hernandez, Production Manager, Textiles del Pacifico S.A. de C.V., San Salvador, El Salvador.

Requirements

  1. Show which variables correlate with higher color match scores -- temperature, humidity, chemical concentration, operator, time of day
  2. Compare quality fairly across the three production lines, even though they have different equipment
  3. Track re-dye rates over time -- is it getting better or worse, and on which lines?
  4. Filter by fabric type and dye color -- some colors are harder to match
  5. Update automatically as new batch data comes in daily

Data sources

  • Batch production data: CSV export from production control system. ~900 batches per day across 3 lines. 14 months of historical data available. Daily export at end of shift.
  • Columns: batch_id, line_number, fabric_type, dye_formula, temperature, humidity, chemical_concentration, color_match_score, pass_fail, operator_id, timestamp

Schema design

Grain decision: What is one row in your fact table?

Staging layer: What source-conforming transformations are needed? (type casts, unit conversions, NULL handling)

Intermediate layer: What business logic transformations? (score normalization, window functions, aggregations)

Mart layer: What final tables serve the analysis? (daily quality by line? operator performance? fabric type comparison?)

Layer architecture

Staging models:

Intermediate models:

Mart models:

Transformation logic

Unit conversions needed:

Score normalization approach:

Window functions needed:

Jinja macros to create:

Quality testing strategy

dbt tests (row-level):

Soda Core checks (batch-level):

Threshold decisions:

Verification targets

See materials/CLAUDE.md for detailed verification targets. Key numbers:

  • Overall re-dye rate: approximately 8-12%
  • Line 1 quality: worse than Lines 2-3
  • OP-001 and OP-003: better pass rates than other operators
  • All staging temperatures: Celsius (65-100 range)
  • dbt build reproducibility: identical results on consecutive runs