Learn by Directing AI
Unit 5

Delivering the System

Step 1: End-to-end test -- the pass path

The system has three components: the CI/CD pipeline with eval gates, the drift detection system, and the response plan that connects them. Before delivering to Priya and Ravi, verify the whole thing works.

Push a code change that triggers the CI/CD pipeline. Watch the GitHub Actions run. The eval gate should run and pass. The drift monitoring step should run and report results. All steps should complete successfully.

Check the drift monitoring output. It should report the current state of drift for the monitored features. The features that drifted (shift patterns, hospital requirements for merged hospitals) should show elevated scores. The features that did not drift should show stable scores.

Step 2: End-to-end test -- the fail path

Now test the failure path. Push a change with a deliberately poor model or modify the eval thresholds to be unreachable. The pipeline should fail at the eval gate step with a clear error message.

Verify that the pipeline actually blocks. A common failure: the eval step runs and prints "FAIL" to the log, but the step itself exits with code 0 -- meaning the pipeline considers it a success. The gate must exit with code 1 on failure. If it does not block, the entire CI/CD system is theater.

Check the pull request view on GitHub. A failed pipeline should show a red status check that prevents merging. This is the structural enforcement Ravi asked for -- not a reminder to run checks, but a gate that prevents bad code from reaching production.

Step 3: Write the delivery summary

Write a summary for Priya and Ravi. Two audiences, two needs.

For Priya: what the system does in operational terms. The monitoring watches for changes in the placement data. If it detects a significant change, it alerts the team and flags affected hospitals for manual review. The response plan specifies what to do at each severity level. She does not need to understand PSI scores or KS tests -- she needs to know when to worry and what to do.

For Ravi: the CI/CD architecture. Every model update runs through the eval suite. If the model does not pass thresholds, deployment is blocked. The drift monitoring runs as an additional step. The workflow is version-controlled and automated. He can see exactly what checks run and what thresholds are enforced.

Step 4: Push to GitHub

Push the final version to GitHub. Write a README that documents:

  • What the CI/CD pipeline does and how to configure eval thresholds
  • What the drift detection system monitors and how to adjust thresholds
  • The response plan: severity levels, response procedures, stakeholder communication
  • How to run the pipeline locally for testing
  • How to add new eval metrics or drift monitoring features

The README is for the next practitioner -- someone who needs to operate this system without having built it. Every decision you made (which metrics to gate on, which features to monitor, what thresholds to set) should be documented with the reasoning, not just the values.

Step 5: Update CLAUDE.md

Update the CLAUDE.md file with the final project state: what was built, what the current thresholds are, where the key configuration files live. This is the project governance file that Claude reads at session start -- keeping it current means the next session picks up where this one left off.

Share the complete system with Priya and Ravi. Priya is relieved: "This is exactly what I needed. We'll know when something changes before it affects placements." Ravi reviews the CI/CD architecture and approves: "No bad model goes live. That's what I wanted." He asks one follow-up about how to add new eval metrics later -- walk him through the workflow.

✓ Check

Check: The end-to-end test shows both the pass path (good model deploys) and the fail path (bad model blocked) working correctly, and the drift monitoring produces appropriate results on the production dataset.

Project complete

Nice work. Ready for the next one?