Learn by Directing AI

Step 1: Look at the data through a governance lens

The pipeline is complete. Models are transforming batch data correctly, quality checks are catching anomalies, CI/CD gates block bad changes, and Dagster freshness policies enforce data currency. Before presenting results to Roberto, look at what the pipeline exposes.

The batch data includes operator_id. The mart models calculate operator-level quality metrics -- pass rates, average color match scores, re-dye rates per operator. Your fct_daily_quality model shows which operators consistently produce the best results and which ones have the highest re-dye rates.

Ask: who is this data about? Six operators run batches on Roberto's production lines. When you build a dashboard showing operator-level performance, those six people's work becomes visible to management. Their individual quality records are exposed.

Not all data is equal. Batch volume by line is operational data -- it describes machines and schedules. Operator-level pass rates are personal performance data -- they describe individuals. The sensitivity is different. A dashboard that shows "Line 1 had an 11% re-dye rate this week" is operational. A dashboard that shows "OP-004 had a 15% re-dye rate while OP-001 had a 3% re-dye rate" is personal.

Step 2: Review pipeline exposure

Look at your mart models and any views or reports you have built. Does the pipeline expose operator-level performance without any access consideration?

"Access control" is not a single concept. It includes:

Query access: Who can run queries against the warehouse?
Column-level security: Who can see the operator_id column versus who only sees aggregated line-level data?
Row-level security: Can a line supervisor see only their own line's operators?
Pipeline modification: Who can change the transformation logic?

At this project's level, you are not implementing any of these controls. The point is noticing that they exist as separate concerns. AI tends to collapse access control into a binary "grant or deny" -- but the reality has layers. Roberto seeing individual operator performance is different from operators seeing each other's performance, which is different from a US brand buyer seeing individual operator performance.

Step 3: Discuss governance with Roberto

If the question of operator data visibility comes up in your conversation with Roberto, he has a thoughtful response. He does not want to punish operators -- he wants to learn from the best performers. Miguel and Sofia on Line 1 consistently produce the best results. Roberto wants to understand their technique so others can learn from it.

But he also recognizes the concern. "Good point. I want to learn from them, not punish anyone. But you're right, I should think about who sees the individual numbers." The question is planted. Roberto's intent is good, but intent does not control how data gets used once it is visible.

This is not a problem to solve in this project. It is a professional habit to develop -- asking "who is this data about?" before building the dashboard, not after. The technical implementation of column-level security and row-level access comes later. The awareness comes now.

Step 4: Present the quality analysis to Roberto

Send Roberto the results. Your analysis should show:

Re-dye rates by line over time
The impact of temperature and humidity on color match quality
Fabric-type-adjusted color match scores (using the normalized scale)
Operator performance patterns
Line-by-line comparison with the temperature conversion applied

Roberto will confirm the numbers match his experience. He knows Line 1 has been struggling. He knows Miguel produces good results. When the data confirms what he sees on the factory floor, that is the ultimate verification -- domain knowledge validating pipeline output.

He will respond warmly. Roberto is proud of his operation and appreciates seeing the data organized in a way he can act on. He will tell a story about showing the operators some preliminary numbers and how Miguel suggested checking the settings from his best batches.

Step 5: Handle scope creep

After reviewing the results, Roberto will make a request: "Hey, one more thing. Can we also track chemical usage per batch? If I can see that higher re-dye rates correlate with lower chemical concentration, I can make the case to management for better chemicals instead of blaming the operators."

This is scope creep. Evaluate it professionally:

The data already includes chemical_concentration in every batch record. No new data source is needed.
The analysis is a correlation between an existing column and the re-dye rate you already calculate.
The work is feasible within the existing pipeline -- it is an additional metric in the mart model, not a new pipeline.

Whether you do it now or flag it for a future iteration is your decision. Either answer is professional. What matters is that you evaluate the request against the pipeline scope and respond with reasoning, not just "yes" or "no."

Step 6: Push and close

Push your final changes to GitHub. Make sure the CI pipeline passes on the final state. Write a README that describes what the pipeline does, how to run it, and what quality checks are in place.

Review your CLAUDE.md one more time. Does it still accurately describe the pipeline? If you added the chemical concentration analysis for Roberto, update the work breakdown and verification targets. The CLAUDE.md should reflect the final state of the project, not the initial plan.

The pipeline transforms Roberto's raw batch records into quality metrics that show which variables drive color match scores, how the three lines compare fairly with temperature conversion applied, and whether things are getting better or worse over time. dbt tests catch row-level problems. Soda Core catches batch-level anomalies. CI/CD gates prevent bad changes from reaching production. Dagster freshness policies ensure the data stays current.

✓ Check