Learn by Directing AI

Step 1: Design the operator performance model

With staging models producing clean temperatures in Celsius and color match scores passed through unchanged, you can start building the transformation layers.

The first intermediate model tracks operator performance trends. Roberto wants to know which operators produce better results, and whether their quality is consistent. That means looking at each operator's recent batches per line and calculating performance metrics.

A common approach: use a window function to rank each operator's batches by recency, then select the most recent ones for trend analysis. Direct Claude to build the intermediate model:

Build an intermediate model int_operator_performance that ranks each operator's batches by recency within each line. Use ROW_NUMBER() to assign a rank partitioned by operator_id and line_number, ordered by timestamp descending. Include the batch_id, color_match_score, pass_fail, and the row number rank.

AI commonly generates window functions that look correct but have a subtle problem. The SQL will be syntactically valid, it will run without errors, and it will produce a result set with the right number of rows. Check the output carefully -- specifically, check whether the ordering is fully deterministic.

Step 2: Check for non-deterministic tie-breaking

The window function Claude generates will likely use ROW_NUMBER() OVER (PARTITION BY operator_id, line_number ORDER BY timestamp DESC). This looks reasonable. But what happens when two batches from the same operator on the same line share the same timestamp?

ROW_NUMBER assigns sequential integers -- 1, 2, 3 -- but when the ORDER BY column has ties, the assignment between tied rows is arbitrary. DuckDB (or any database) picks one, but the choice is not guaranteed to be the same on the next run. Run the query twice and you might get different "latest batch" selections.

Check the data for timestamp ties:

SELECT operator_id, line_number, timestamp, COUNT(*) as batch_count
FROM stg_batch_data
GROUP BY operator_id, line_number, timestamp
HAVING COUNT(*) > 1;

If this returns any rows, the window function has non-deterministic tie-breaking. The fix is a secondary sort column. Adding batch_id to the ORDER BY makes the ranking fully deterministic because batch IDs are unique:

ROW_NUMBER() OVER (PARTITION BY operator_id, line_number ORDER BY timestamp DESC, batch_id DESC)

This is a pattern worth recognizing. AI generates window functions with a single ORDER BY column because it looks clean. When the column has ties, the result is non-deterministic. Every window function you encounter should be checked: is the ORDER BY column unique within the partition? If not, add a tiebreaker.

Step 3: Build the color match normalization macro

The color match score normalization logic needs to appear in multiple downstream models -- the intermediate quality calculations and the mart-level summaries both need normalized scores. Duplicating the formula in each model creates drift risk: if you change the normalization logic, you have to remember to change it everywhere.

A Jinja macro centralizes the logic. Direct Claude to create a macro that takes a fabric type and a raw score and returns a normalized 0-100 scale:

Create a dbt Jinja macro called normalize_color_score that takes fabric_type and raw_score as arguments. For polyester (Delta-E scale, lower is better): normalized = max(0, 100 - (raw_score * 16.67)). For cotton_blend (spectrophotometer, higher is better): normalized = raw_score. Return the normalized score on a 0-100 scale where higher is always better.

The macro design decision itself matters. Not everything should be a macro. If logic appears in only one model, a macro adds indirection without benefit. If logic appears in two or more models and could change independently, a macro prevents drift. The color match normalization is a good candidate because the formula is non-obvious (that 16.67 factor) and appears in multiple places.

AI tends toward two extremes: creating macros for everything (over-abstracting) or duplicating logic across models (under-abstracting). The right answer is a judgment call about maintainability.

Step 4: Build the line quality intermediate model

The next intermediate model calculates quality metrics at the line level -- pass rates, average normalized scores, batch counts by line and date. This requires joining the batch-level staging data to line-level aggregations.

Direct Claude to build this model. Then check the row count. AI commonly joins tables at different grains without aggregating first. If you join a batch-level table (one row per batch) to a line-level summary (one row per line) without the right join conditions, each batch row matches the line summary and the result has the correct number of rows -- but if the grain is off, rows multiply silently. A revenue total inflated 3x by a grain mismatch does not announce itself as wrong. It just looks like a big number.

Verify the join by checking counts:

SELECT COUNT(*) as total_batches FROM stg_batch_data;

Compare that to the row count in your intermediate model. If the intermediate model has more rows than the staging model, a grain mismatch has multiplied rows. Fix it by aggregating to the target grain before joining, or by adding the correct join conditions to prevent fan-out.

Step 5: Build the mart model

Build the mart model fct_daily_quality that shows daily re-dye rates by line, fabric type, and operator. This model uses the normalized color match score (via the Jinja macro) and the deterministic window function. It is the output Roberto will use to understand which variables drive quality.

Direct Claude to build the model, referencing the macro and the intermediate models. The mart should include:

Daily re-dye rate by line (percentage of batches that failed)
Average normalized color match score by line
Pass rate by operator
Batch count by line and fabric type

Step 6: Verify against Roberto's numbers

Roberto said the overall re-dye rate is about 8%. Query the mart model:

SELECT ROUND(100.0 * SUM(CASE WHEN pass_fail = false THEN 1 ELSE 0 END) / COUNT(*), 1) as redye_rate
FROM fct_daily_quality;

The result should be in the 8-12% range. If it is dramatically different, something is wrong in the transformation chain -- a grain mismatch inflating totals, a normalization error flipping pass/fail logic, or a window function selecting the wrong rows.

Run dbt build a second time on the same data. Compare the mart output from both runs. If the results differ, the window function tie-breaking is still non-deterministic.

✓ Check