Learn by Directing AI
Unit 3

Health Checks and Model Versioning

Step 1: Review the Health Check and Versioning Tickets

Open materials/tickets.md and read T-04 (health check endpoint), T-05 (model versioning), and T-06 (test health check by simulating failure). These two features address related problems: knowing whether the service is ready, and knowing which model produced each prediction.

A health check answers a specific question: "Is this service ready to serve predictions right now?" The answer requires more than confirming the process is running. The process was running last Tuesday when Emeka's API went down -- the model file had failed to load but the server was still up. A health check that says "everything is fine" when the model isn't loaded is a health check that lies.

Step 2: Design and Build the Health Check

Direct Claude to add a /health endpoint that verifies the model is actually loaded and functional. Something like: "Add a GET /health endpoint that checks whether the model file is loaded and the model can produce a prediction on a reference input. Return model status and version. Return 503 if the model is not available."

Review what AI produces. AI commonly generates health checks that return 200 unconditionally -- return {"status": "ok"} -- without checking anything meaningful. If the health check doesn't actually verify the model is loaded and functional, direct Claude to add the real checks.

Step 3: Test the Health Check by Simulating Failure

This is the infrastructure payoff. Direct Claude to test the health check by temporarily removing or renaming the model file, then calling the /health endpoint.

When the model is present, /health should report healthy with the model version. When the model is gone, /health should report unhealthy with a clear error. This is exactly what would have caught last Tuesday's outage -- instead of Adaeze discovering the failure two hours later, the health check would have flagged it immediately.

Restore the model file and verify /health returns to healthy.

Step 4: Add Model Versioning to Prediction Responses

Direct Claude to include the model version in every prediction response. The version should come from model metadata or a version file -- not be hardcoded in the source code. Every response should look something like {"prediction": 1, "probability": 0.73, "model_version": "v2.1"}.

Without versioning, "the model gave a wrong prediction on Tuesday" is undiagnosable. Which model? The one from last week, or the one the data team was testing? Model versioning is the serving layer's contribution to the evaluation chain.

Step 5: Test the Complete Infrastructure

Run the full API with all infrastructure in place: input validation, structured errors, health check, and model versioning. Send a few requests and verify:

  • A valid prediction request returns a response with the prediction, probability, and model_version field
  • An invalid request returns a structured error (from Unit 2)
  • The /health endpoint reports the model load status and version
  • Removing the model file causes /health to report unhealthy

Everything that was missing in the baseline API from Unit 1 is now in place for the serving layer. The API validates input, reports its own health, versions its responses, and communicates errors clearly.

✓ Check

Check: The /health endpoint returns a response that includes model load status. Prediction responses include a model_version field. When the model file is removed, the health check reports unhealthy.