Learn by Directing AI
All materials

CLAUDE.md

Project: Textile Dyeing Quality Analysis

Client

Roberto Hernandez, Production Manager at Textiles del Pacifico S.A. de C.V. in San Salvador, El Salvador. Three production lines dyeing and finishing knitted fabrics for US athleisure brands. Re-dye rate is ~8%, needs to get below 5% for contract renewal.

What you're building

A dbt pipeline that ingests batch production data from three dyeing lines, normalizes measurements across different equipment and scoring systems, calculates quality metrics by line/operator/fabric type, and monitors quality trends over time with Soda Core. The pipeline feeds dashboards that show Roberto which variables drive color match quality.

Tech stack

  • dbt Core with DuckDB adapter
  • Soda Core (quality monitoring with trend analysis)
  • Dagster (orchestration, freshness policies)
  • GitHub Actions (CI/CD quality gates)
  • DuckDB (local warehouse)
  • Python 3.x

Data dictionary

Column Type Description Notes
batch_id string Unique batch identifier Format: LN-YYYYMMDD-NNN (e.g., L1-20250115-042)
line_number integer Production line (1, 2, or 3) Line 1 is older American equipment
fabric_type string "polyester" or "cotton_blend" Affects which color match scale is used
dye_formula string Dye formula code 8 distinct formulas (e.g., PMS-2145, RB-0087)
temperature float Dyeing temperature KNOWN ISSUE: Line 1 records in Fahrenheit (155-210), Lines 2-3 in Celsius (65-98). Staging must convert Line 1 to Celsius.
humidity float Plant humidity percentage 45-85%. Occasional NULL values from sensor gaps.
chemical_concentration float Chemical concentration in g/L Range 2.0-8.5
color_match_score float How close dye matches the target color KNOWN ISSUE: Two different scales. Polyester: Delta-E (0-6, lower is better, pass < 2.0). Cotton blend: spectrophotometer (0-100, higher is better, pass > 95). Both called "color_match_score" in the data. Normalization required in intermediate layer.
pass_fail boolean Whether batch passed quality check Derived from color_match_score and fabric_type
operator_id string Operator who ran the batch 6 operators: OP-001 through OP-006
timestamp datetime When the batch was processed Work hours 6am-10pm

Naming conventions

  • stg_ prefix: staging models (source-conform, no business logic)
  • int_ prefix: intermediate models (joins, calculations, business logic)
  • fct_ prefix: fact tables (mart layer)
  • dim_ prefix: dimension tables (mart layer)

Known data quality concerns

  1. Temperature units: Line 1 is Fahrenheit, Lines 2-3 are Celsius. Must convert in staging.
  2. Color match score dual scales: Polyester uses Delta-E (lower is better), cotton blend uses spectrophotometer (higher is better). Must normalize in intermediate layer.
  3. NULL humidity values: Sensor gaps produce occasional NULLs. Handle in staging (don't drop rows).
  4. Window function non-determinism: Some batches share operator_id + timestamp. Window functions need a tiebreaker column (batch_id) to produce deterministic results.

Work breakdown

  1. Profile dataset and discover data quality issues
  2. Design schema with unit conversions and normalization strategy
  3. Build staging models (temperature conversion, NULL handling)
  4. Build intermediate models (window functions, color score normalization macro, line-level quality)
  5. Build mart models (daily quality by line/operator/fabric)
  6. Add dbt tests (structural + business logic)
  7. Add Soda Core trend checks (batch count anomaly, quality score ranges)
  8. Configure Dagster freshness policies and materialisation schedule
  9. Set up GitHub Actions CI/CD quality gates

Verification targets

  • Overall re-dye rate should be approximately 8-12% (matching Roberto's estimate)
  • Line 1 should show worse quality than Lines 2-3
  • OP-001 and OP-003 should show better pass rates than other operators
  • Temperature values in staging output should all be in Celsius (65-100 range)
  • Running dbt build twice should produce identical results (deterministic window functions)
  • Soda Core checks should pass on normal days and flag the maintenance day

Commit convention

Commit after each major pipeline stage (staging complete, intermediate complete, mart complete, tests added, CI/CD configured). Use descriptive messages: "feat: add staging model with temperature conversion" or "test: add Soda Core trend checks for batch count anomaly."