Learn by Directing AI

The Brief

Emeka is back again. The churn model from P2 is running and his retention team relies on it weekly. But last Tuesday the API went down for two hours and nobody knew until Adaeze on his team tried to pull the weekly list and got an error. She ended up doing the calls manually from last week's list.

Two things worry him. The API has no monitoring -- nothing tells anyone when it stops working. And his data team wants to try different model settings to improve results, but they're afraid of breaking the one prediction system the retention team depends on. If anything goes wrong, nobody else can reproduce what was built.

Your Role

You're making the churn prediction system reliable. The model works. The API serves predictions. What's missing is the infrastructure that makes it professional: input validation that catches bad requests before they reach the model, health checks that report when the service is actually down, model versioning so every prediction can be traced back to the model that produced it, and experiment tracking that lets the data team run experiments safely and reproducibly.

You direct Claude Code through the infrastructure work. The data and model are familiar from P1-P2. What's new is the infrastructure layer around them.

What's New

Last time you built the full artifact creation pipeline: PRD, evaluation design, preprocessing, training, serving, documentation. You made the decisions about metrics and encoding.

This time the data and model are settled. The terrain shifts to reliability and reproducibility. You'll encounter AI generating infrastructure code that looks thorough but silently fails -- validation that checks type but not range, health checks that always say "everything is fine," experiment logging that records hardcoded values instead of actual variables. Catching these failures is the new verification challenge.

Tools

Python -- infrastructure code, validation models, health checks
FastAPI / uvicorn -- serving (familiar, now enhanced)
Pydantic -- input validation (new at this level of detail)
MLflow -- experiment tracking as infrastructure (deepening from P2)
scikit-learn -- model training (familiar)
Claude Code -- AI direction
Git / GitHub -- version control
curl -- API testing

Materials

You receive:

Emeka's email about the API outage and his team's concerns
The P2 API code as a starting point (basic endpoint without infrastructure)
A data profile showing the training data's ranges and types (for validation boundaries)
A ticket breakdown covering all infrastructure work
A project governance file (CLAUDE.md)