Learn by Directing AI
Unit 4

The Fairness Audit

Step 1: Priya's equity push

Priya messages on Slack. Her board just passed an equity policy and she needs to demonstrate compliance. She wants to see the match quality scores broken down by region.

This is not a hypothetical exercise. Priya's board has made equitable placement a policy requirement, and she needs numbers she can present. The aggregate scores you computed in the previous unit told one story. The disaggregated scores will tell another.

Step 2: Disaggregate predictions by region

Open materials/fairness-audit-guide.md. The guide covers disaggregated evaluation, common fairness metrics, and intervention options.

Disaggregated evaluation means computing your metrics separately for each value of a demographic column -- in this case, nurse_region. Instead of one accuracy number, one F1 score, one precision, and one recall for the whole dataset, you get separate numbers for South India, West India, North India, East India, Northeast India, and Central India.

Direct Claude to compute per-region metrics for the model's predictions. AI commonly computes aggregate metrics without disaggregating -- the code runs, produces numbers, and those numbers look reasonable. You need to specify subgroup analysis explicitly.

Step 3: Discover the disparity

The disaggregated metrics reveal what the aggregate hid. Nurses from Northeast India have significantly lower match quality scores than nurses from other regions.

Look at the numbers. The overall score looked good. But broken down by region, one group is consistently worse. Not by a small margin -- the gap is substantial.

Is this because Northeast nurses are less qualified? Check the data. Their certifications, experience levels, and specializations are comparable to nurses from other regions. The qualification deficit is not there.

The disparity is in the training data itself. Historically, hospitals have placed Northeast nurses at lower rates. The model learned that pattern -- not as a judgment about qualifications, but as a statistical correlation in the data. The model is faithfully reproducing historical placement bias as a prediction.

Step 4: Investigate and understand why

This is worth sitting with. The model did exactly what it was trained to do. It found patterns in the data. One of those patterns is that nurses from a particular region were historically placed less often. The model has no concept of fairness -- it optimized the objective function you gave it, using all the patterns available.

The training data encoded historical bias. The data determined what the model learned -- including the biases.

If the session has become long and complex, consider starting a fresh Claude session with consolidated context. Pull together what has been decided: the Pipeline architecture, the model configuration, the evaluation criteria, and the fairness finding. A clean session with focused context avoids context degradation -- earlier constraints falling out of effective attention.

Step 5: Apply fairness interventions

The fairness audit guide describes several intervention options: rebalancing the training data, adjusting thresholds per group, or adding fairness constraints to the training objective.

Each approach has trade-offs. Rebalancing addresses the root cause but may reduce overall accuracy. Threshold adjustment preserves the model but may increase false positives for some groups. Constraint-based training optimizes for both accuracy and fairness simultaneously but may reduce peak performance.

This is a genuine design choice. You decide which intervention fits Priya's situation -- a staffing platform where equitable access to placements is now a board-level policy requirement.

Apply your chosen intervention and recompute the disaggregated metrics. The gap should narrow substantially. Overall performance may decrease slightly -- that is the trade-off, and it is worth understanding honestly.

Step 6: Communicate findings to Priya

Priya needs to present this to her board. She does not need to understand demographic parity ratios or equalized odds. She needs to know three things: what the disparity was, why it existed, and what was done about it.

Translate the findings into terms she can use. "Nurses from the Northeast were being matched at a lower rate -- not because of their qualifications, which are comparable to other regions, but because the historical data reflects patterns where those nurses were placed less often. The model was learning from that history and reinforcing it. We adjusted the model so placement rates are fair across regions. Overall match quality dropped slightly -- from X to Y -- but the system no longer systematically disadvantages nurses from any region."

Priya's response will matter. She is alarmed by the disparity but appreciative of the transparency. She will ask about the trade-off: does making it fairer mean the matches are worse overall? Answer honestly. A small decrease in aggregate match quality is the cost of ensuring the system does not encode historical bias. Whether that trade-off is acceptable is ultimately her decision, but the numbers should make it informed.

✓ Check

Check: The disaggregated metrics show the regional disparity before intervention (Northeast India placement score at least 15% lower than the mean), and the post-intervention metrics show the gap reduced to within 5% while overall performance remains above the baseline.