Project: Textile Dyeing Quality Analysis

Client

Roberto Hernandez, Production Manager at Textiles del Pacifico S.A. de C.V. in San Salvador, El Salvador. Three production lines dyeing and finishing knitted fabrics for US athleisure brands. Re-dye rate is ~8%, needs to get below 5% for contract renewal.

What you're building

A dbt pipeline that ingests batch production data from three dyeing lines, normalizes measurements across different equipment and scoring systems, calculates quality metrics by line/operator/fabric type, and monitors quality trends over time with Soda Core. The pipeline feeds dashboards that show Roberto which variables drive color match quality.

Tech stack

dbt Core with DuckDB adapter
Soda Core (quality monitoring with trend analysis)
Dagster (orchestration, freshness policies)
GitHub Actions (CI/CD quality gates)
DuckDB (local warehouse)
Python 3.x

Data dictionary

Column	Type	Description	Notes
batch_id	string	Unique batch identifier	Format: LN-YYYYMMDD-NNN (e.g., L1-20250115-042)
line_number	integer	Production line (1, 2, or 3)	Line 1 is older American equipment
fabric_type	string	"polyester" or "cotton_blend"	Affects which color match scale is used
dye_formula	string	Dye formula code	8 distinct formulas (e.g., PMS-2145, RB-0087)
temperature	float	Dyeing temperature	KNOWN ISSUE: Line 1 records in Fahrenheit (155-210), Lines 2-3 in Celsius (65-98). Staging must convert Line 1 to Celsius.
humidity	float	Plant humidity percentage	45-85%. Occasional NULL values from sensor gaps.
chemical_concentration	float	Chemical concentration in g/L	Range 2.0-8.5
color_match_score	float	How close dye matches the target color	KNOWN ISSUE: Two different scales. Polyester: Delta-E (0-6, lower is better, pass < 2.0). Cotton blend: spectrophotometer (0-100, higher is better, pass > 95). Both called "color_match_score" in the data. Normalization required in intermediate layer.
pass_fail	boolean	Whether batch passed quality check	Derived from color_match_score and fabric_type
operator_id	string	Operator who ran the batch	6 operators: OP-001 through OP-006
timestamp	datetime	When the batch was processed	Work hours 6am-10pm

Naming conventions

stg_ prefix: staging models (source-conform, no business logic)
int_ prefix: intermediate models (joins, calculations, business logic)
fct_ prefix: fact tables (mart layer)
dim_ prefix: dimension tables (mart layer)

Known data quality concerns

Temperature units: Line 1 is Fahrenheit, Lines 2-3 are Celsius. Must convert in staging.
Color match score dual scales: Polyester uses Delta-E (lower is better), cotton blend uses spectrophotometer (higher is better). Must normalize in intermediate layer.
NULL humidity values: Sensor gaps produce occasional NULLs. Handle in staging (don't drop rows).
Window function non-determinism: Some batches share operator_id + timestamp. Window functions need a tiebreaker column (batch_id) to produce deterministic results.

Work breakdown

Profile dataset and discover data quality issues
Design schema with unit conversions and normalization strategy
Build staging models (temperature conversion, NULL handling)
Build intermediate models (window functions, color score normalization macro, line-level quality)
Build mart models (daily quality by line/operator/fabric)
Add dbt tests (structural + business logic)
Add Soda Core trend checks (batch count anomaly, quality score ranges)
Configure Dagster freshness policies and materialisation schedule
Set up GitHub Actions CI/CD quality gates

Verification targets

Overall re-dye rate should be approximately 8-12% (matching Roberto's estimate)
Line 1 should show worse quality than Lines 2-3
OP-001 and OP-003 should show better pass rates than other operators
Temperature values in staging output should all be in Celsius (65-100 range)
Running dbt build twice should produce identical results (deterministic window functions)
Soda Core checks should pass on normal days and flag the maintenance day

Commit convention

Commit after each major pipeline stage (staging complete, intermediate complete, mart complete, tests added, CI/CD configured). Use descriptive messages: "feat: add staging model with temperature conversion" or "test: add Soda Core trend checks for batch count anomaly."

CLAUDE.md