Learn by Directing AI

Step 1: Send the deliverables

Send Wanjiku the ranked appointment list and the client summary. Include a brief message explaining what the list shows: these are the upcoming appointments ranked by predicted no-show risk. The ones at the top are where Grace should focus her reminder calls. The ones at the bottom are low-risk and probably do not need extra attention.

This is the moment where all the upstream work converges. The cleaning decisions, the temporal split, the model evaluation, the RMSE translation -- all of it exists so that this list is honest and this summary is accurate.

Step 2: Read Wanjiku's response

Wanjiku will respond. She has been waiting for this since the email that started the project. Read what she says carefully. She may ask a practical follow-up question about using the model going forward.

If she asks how often to re-run the model, that is a reasonable question. A brief, practical answer works: as new appointment data accumulates (every few months), re-fit the model on the updated data. The patterns may shift with seasons or staff changes. There is no formula for re-run frequency -- it depends on how fast the clinic's patterns change.

Step 3: Address accuracy questions

If Wanjiku or her associate vet asks about accuracy, reference the RMSE in plain language. "The model is typically off by about X percentage points" is the answer. Not R-squared. Not a confidence interval. The practical error magnitude.

A prediction without an indication of how much it might miss by is not a finished prediction. The RMSE is what makes it honest. If the model says an appointment has a 30% no-show probability and RMSE is 0.15, the real probability could reasonably be anywhere from 15% to 45%. That range matters for scheduling decisions.

Step 4: Write the decision record

Write a decision record documenting the temporal split decision. This is the most important analytical choice in the project, and it deserves documentation.

The record should cover:

What was tried first: Random train/test split (AI's default).
Why it was wrong: Random splitting on time-dependent data is data leakage. The model "sees" future patterns during training and gets credit for predicting data it already learned from. The accuracy is fake.
What was done instead: Temporal split -- train on the first 15 months, test on the last 6.
What the impact was: R-squared dropped from the random-split value to the temporal-split value. That is not the model getting worse. That is the evaluation becoming honest.

This decision record is not overhead. If someone picks up this project later -- or if you re-run the model with new data -- the record explains why the temporal split matters and what happens if you skip it.

Step 5: Commit and push

Direct AI to commit the full project to Git. The commit should include:

The Jupyter notebook with the complete analysis
The ranked appointment list
The client summary
The preparation log from Unit 2
The decision record from Step 4

Use a meaningful commit message that describes the completed work: what was built, the key analytical decision (temporal split), and the deliverable (ranked appointment list for Wanjiku).

Direct AI to push to GitHub. Verify the push succeeded.

The repository should contain everything someone would need to understand the analysis: the data preparation decisions, the model building process, the evaluation against the naive baseline, and the client-facing deliverables. The preparation log and decision record are as important as the notebook -- they document why the work was done the way it was.

✓ Check

Check: Wanjiku has received and responded to the findings. The Git repository contains the notebook, the ranked list, the summary, the preparation log, and the decision record. The push succeeded.

Deliver and close

Step 1: Send the deliverables

Step 2: Read Wanjiku's response

Step 3: Address accuracy questions

Step 4: Write the decision record

Step 5: Commit and push