Step 1: Review the Serving Tickets
Check materials/CLAUDE.md for the project structure if you need a reminder. Open materials/tickets.md and find the serving tickets: T10, T11, and T12. Three tasks: build a FastAPI endpoint that loads the trained model and returns churn predictions, test it with valid and invalid input, and add request logging.
Until now, the model lives in a notebook. A Jupyter notebook is a document that mixes code, output, and notes in one place -- you can run sections independently and see results inline. It runs when you run the cell. If you close the notebook, the model is gone from memory. Emeka's retention team cannot open your notebook every Monday morning and paste subscriber data into a cell. They need something they can call from their own systems — an API endpoint that accepts subscriber features and returns a churn probability.
That is a categorical shift. A notebook model is a tool for the person who built it. An API model is a service for anyone who knows the endpoint.
Step 2: Serialize the Trained Model
Before the API can serve predictions, the trained model needs to exist as a file on disk. Right now it exists as a Python object in your notebook's memory. Serialization means saving a Python object to a file so another program can load it later. Joblib is the library that does this for ML models. Serializing the trained model saves its learned parameters, its structure, everything it needs to make predictions — to a file that another script can load.
Direct Claude to serialize the trained model using joblib. Something like: "Save the trained RandomForest model to a file called model.joblib using joblib."
The result is a frozen snapshot. It captures the model's parameters as they are right now — the exact split thresholds, feature importances, and class weights the RandomForest learned during training. What it does not capture is the preprocessing pipeline, the training data, or the decisions you made along the way. If someone loads model.joblib next month, they get predictions but no context about how the model was built or what data it expects. That gap matters, and it will matter more in later projects.
Step 3: Build the FastAPI Endpoint
An API (Application Programming Interface) is a program that listens for requests over the network and sends back responses. An endpoint is a specific URL the API responds to. A POST request sends data to the API (in this case, subscriber features) and gets a result back (a churn prediction).
Direct Claude to create the FastAPI app. Be specific about what the endpoint should do: "Build a FastAPI app in app.py that loads model.joblib, accepts subscriber features as JSON via a POST endpoint at /predict, and returns a JSON response with a churn probability and a binary prediction."
Break this into a focused request. You are asking for one thing: an endpoint that loads the model and serves predictions. Not logging, not error handling, not deployment configuration. One task, one prompt. The serving code, the edge case handling, and the logging are separate tickets for a reason — each is a distinct piece of work that you can verify independently.
AI-generated serving code commonly handles the straightforward case well — valid JSON in, prediction out. What it tends to skip is the translation layer between raw input and what the model actually expects. The model was trained on preprocessed features. The API receives raw subscriber data. Something needs to bridge that gap, and AI does not always build that bridge unless you ask for it.
Step 4: Test with Valid Input
Uvicorn is the server that runs the FastAPI app -- it starts listening for requests. Curl is a command-line tool that sends HTTP requests -- you use it to test the API by sending sample data and checking the response.
Start the server and test it. Direct Claude: "Start the FastAPI server with uvicorn and test the /predict endpoint with a curl request using sample subscriber features."
Look at the response. It should be a JSON object with a churn probability — a float between 0 and 1 — and a binary prediction. The probability is the important part. Emeka asked for ranked predictions so his team can prioritize: a subscriber with 0.82 probability gets a call before one with 0.34. A binary yes/no answer alone would not give his team that ranking.
This is the moment the model becomes a service. Anyone who can send an HTTP request to this endpoint can get a churn prediction. They do not need Python, scikit-learn, or your notebook. They need the URL and the input format.
Step 5: Test Edge Cases
Now test what happens when the input is wrong. Direct Claude to send requests with a missing feature, then with a value of the wrong type. Something like: "Send a curl request to /predict with the tenure_months field missing. Then send another with monthly_charges set to the string 'abc' instead of a number."
Do not predict what will happen — run the requests and read the responses.
AI-generated endpoints commonly handle the expected input format and leave the unexpected cases to fail silently or with unhelpful errors. The gap between "works with good input" and "handles real-world input" is where production problems live. A subscriber record pulled from Emeka's billing system might have a null field or a formatting inconsistency. The endpoint needs to respond with a clear error, not a stack trace or a silent wrong answer.
If the error handling is missing or unclear, direct Claude to add input validation. Be specific: "Add Pydantic validation to the request model so missing required fields and wrong types return a 422 with a clear error message."
Step 6: Add Prediction Logging
Direct Claude to add request logging: "Log each prediction request and response to a file with a timestamp. Include the input features, the predicted probability, and the binary prediction."
This is the first monitoring signal. Right now, the model is new and the training data is fresh. But subscriber behavior changes. Tunde Mobile might launch a new plan, change pricing, or expand to a new city. When that happens, the patterns the model learned from twelve months of historical data may no longer match what is happening now. A deployed ML model degrades over time as the world changes around it — unlike a static piece of software that does the same thing forever.
The server being up is not the same as the predictions being good. The endpoint can return 200 for every request while the predictions drift further from reality. The prediction log is how you notice. If the model starts predicting churn probability of 0.02 for every subscriber next month, the log will show it. Without the log, no one knows until Emeka's team reports that the calls are not working.
Step 7: The API Contract
Emeka's team needs to know what the endpoint expects and what it returns. Direct Claude to describe the API contract: what URL to call, what JSON fields to send, what the response looks like, and what errors mean.
This is professional communication — the same kind as translating metrics into business terms in Unit 3. The contract is the document anyone calling the endpoint needs. Without it, Emeka's developer has to guess at field names and data types, or read your source code.
Emeka asks: "So my team can just send a subscriber's data and get back how likely they are to leave? And it's ranked — highest risk first?" Yes. The probability score is the ranking. His team pulls the week's subscribers, sends each through the endpoint, sorts by probability, and calls from the top. The highest-risk subscribers get attention first. That is exactly what he asked for in the brief.
✓ Check: The endpoint returns a JSON response with a churn probability for a valid sample request (curl returns 200 with a probability between 0 and 1).