Learn by Directing AI

Step 1: Detection without response is noise

The drift detection system works. It catches the distributional shifts from the hospital mergers and shift pattern changes. But an alert that says "drift detected" without specifying what to investigate, what to check, or what to do is operationally useless.

Priya's team needs to know: when an alert fires, what happens next? Who investigates? What do they check first? Under what conditions do they retrain the model versus wait versus escalate to Ravi?

The response plan turns monitoring from a notification system into a decision system.

Step 2: Design the response plan

Open materials/response-plan-template.md. It has four sections: severity levels, response procedures, stakeholder communication, and retraining criteria.

For each severity level, you design what happens. Low severity might mean a small distributional shift that does not yet affect match quality -- log it, monitor it, review at the next monthly check. High severity means a large shift that is likely degrading predictions now -- investigate immediately, retrain if confirmed, notify Priya.

The boundaries between severity levels are design decisions. Where you draw the line between "monitor" and "investigate" depends on the cost of each response. For MedConnect, bad placements have real consequences for nurses and hospitals. The severity thresholds should reflect that.

Step 3: Calibrate thresholds to Priya's context

The default PSI threshold of 0.25 may not be appropriate for healthcare staffing where a wrong match has real consequences for patients and nurses.

When directing Claude to help design the thresholds, include the business context: 400 placements per quarter, hospitals that depend on timely matches, the equity commitments from P6's fairness audit. The thresholds are not just statistical -- they encode how sensitive the monitoring should be for this specific client.

A financial trading model might need extremely tight thresholds because the cost of a wrong prediction is immediate and large. A content recommendation model might tolerate wider thresholds because a slightly off recommendation has low consequences. MedConnect is in between -- the consequences are real but not catastrophic on a single-placement level. They become serious if the drift goes undetected across many placements.

Step 4: Integrate drift monitoring into the CI/CD pipeline

The CI/CD pipeline from Unit 2 gates model quality. Now add drift monitoring as a second layer. Add a step to the GitHub Actions workflow that runs drift detection on the latest production data.

The eval gate is a hard block: if the model fails eval, deployment stops. Drift monitoring is a graduated response: low-severity drift produces a warning, high-severity drift can block deployment or trigger an alert.

The two systems together form a two-layer safety net: the eval gate catches bad models, drift monitoring catches degraded data. Neither alone is sufficient. A model that passes eval on the test set might still perform poorly on drifted production data.

Step 5: Connect the system

Share the response plan with Priya. She reviews the severity levels and response procedures.

She pushes back on one thing: the "wait and monitor" response for low-severity drift. "In healthcare staffing, waiting means we might make bad placements. Can we at least flag those hospitals for manual review while we wait?"

She is right. Adjust the low-severity response to include flagging affected hospitals for manual review by the operations team. This is a small change to the plan, but it reflects the operational reality -- even low-severity drift in healthcare staffing has consequences that a content recommendation system would not.

Commit the response plan and the updated CI/CD workflow.

✓ Check

Check: The response plan specifies at least three severity levels with distinct response procedures, and the CI/CD pipeline includes a drift monitoring step that produces warnings or failures based on severity.

The Response Plan

Step 1: Detection without response is noise

Step 2: Design the response plan

Step 3: Calibrate thresholds to Priya's context

Step 4: Integrate drift monitoring into the CI/CD pipeline

Step 5: Connect the system