Learn by Directing AI
Chat with Roberto HernandezAsk questions, get feedback, discuss the project

The Brief

Roberto Hernandez manages production at Textiles del Pacifico in San Salvador -- a textile dyeing and finishing operation running three production lines. Each line processes about 30 batches per day, dyeing knitted fabrics for US athleisure brands. Every batch gets a color match score measuring how close the dye color is to what the brand ordered.

The re-dye rate is at 8%. Roberto's biggest client says it needs to be under 5% by the next contract review in three months. He has 14 months of batch records -- machine settings, temperatures, chemical concentrations, humidity, operator IDs, color match scores -- but comparing quality across the three lines is complicated because Line 1 runs older American equipment with different settings ranges.

Roberto can tell you that humidity matters and that some operators produce better results than others. What he can't tell you is why, or which combination of variables actually drives quality. The data is there. The analysis isn't.

Your Role

You're building the analytical pipeline that makes sense of Roberto's batch data -- transforming raw production records into quality metrics that show which variables drive color match scores, how the three lines compare fairly, and whether things are getting better or worse over time.

The transformation work is more complex than previous projects. You'll use window functions to track operator trends, handle joins between tables at different grains, and build reusable Jinja macros for logic that appears in multiple models. You'll also add a second quality monitoring layer -- Soda Core for trend analysis alongside dbt tests for row-level checks -- because some failures only show up as patterns across batches, not as individual bad rows.

You'll plan the pipeline work before starting, using plan mode to decompose and sequence the steps. And you'll create the project's CLAUDE.md yourself this time -- the data dictionary and naming conventions that shape every model Claude produces.

What's New

Last time you connected three data sources for Francoise's timber operation, resolved identity across systems, and orchestrated with Dagster for the first time. The pipeline was complex because of multiple sources and identity resolution -- but the transformations themselves were straightforward joins and aggregations.

This time the sources are simpler (one CSV), but the transformations are harder. Window functions look correct row-by-row but can produce wrong results in aggregate. Joining tables at different grains silently inflates totals. You'll encounter numbers that look right and aren't -- and the only way to catch them is to run the pipeline twice and compare, or to check against Roberto's domain knowledge.

Soda Core enters as a new tool for quality monitoring. dbt tests tell you whether individual values are valid. Soda Core tells you whether today's batch looks normal compared to yesterday's. They catch different kinds of failures.

Data governance enters for the first time. Roberto's data includes operator-level performance metrics. The question of who should see that data -- and whether the operators would want their individual performance visible on a dashboard -- is worth considering.

Tools

  • dbt Core + dbt-duckdb adapter -- transformation framework
  • DuckDB -- local analytical database
  • Soda Core -- quality monitoring with trend analysis (new this project)
  • Dagster -- orchestration with freshness policies and materialisation scheduling
  • GitHub Actions -- CI/CD quality gates (new this project)
  • Claude Code -- your AI agent, with plan mode for decomposition
  • Git / GitHub -- version control

Materials

You'll receive:

  • Batch data -- a 30-row sample for initial exploration and a 900-row full dataset representing one day of production across all three lines
  • Pipeline spec template -- requirements pre-filled from Roberto's brief, design sections empty for you to complete
  • CLAUDE.md template -- the project governance file structure, yours to fill with the data dictionary and conventions you establish
  • Soda Core configuration guide -- installation, check file structure, running scans
  • GitHub Actions template -- CI/CD workflow for dbt tests and Soda Core checks