Step 1: Read the model building plan
Open materials/project-plan.md, Section 3 (Model Building).
The plan calls for a linear regression predicting no-show probability from the available features: day_of_week, time_slot, visit_type, pet_species, client_tenure. Linear regression is the starting point because it is simple and interpretable. Wanjiku does not need a complex model. She needs one that works and that you can explain.
The plan also mentions splitting the data into training and test sets. The model learns from the training data and is evaluated on data it has never seen. This is how you know whether it actually works, not just memorizes.
Step 2: Split the data
Direct AI to split the cleaned dataset into a training set and a test set. AI will need to encode categorical features (one-hot encoding or similar) before fitting. Let AI handle the encoding and the split.
Watch what AI does with the split. It will use a default method. Note what that method is.
Step 3: Fit the model
Direct AI to fit a linear regression model on the training set, predicting no-show probability from the features. Then evaluate it on the test set: R-squared and RMSE.
R-squared tells you what fraction of the variation in no-shows the model explains. RMSE tells you the typical prediction error. Both are numbers that mean nothing in isolation -- you will need a reference point to interpret them.
Review the results. Note the R-squared value.
Step 4: Check against the verification target
Open materials/verification-targets.md. Find the targets for model building.
Compare your R-squared to the target range. If the number looks higher than the target says an honest model should produce, something is off. A model that looks too good is not a good model. It is a suspicious model.
Step 5: Investigate the split
Read the project plan's note on temporal discipline. This dataset has a time dimension. Every appointment has a date. When AI split the data, it shuffled rows randomly. Some training rows come from September. Some test rows come from March. The model has "seen" September patterns in training and is being tested on March data it already learned from.
That is data leakage. The model is being tested on questions it has already seen the answers to. The accuracy is fake.
Step 6: Re-split temporally
Direct AI to re-split the data using time: train on the first 15 months, test on the last 6 months. The model learns only from the past and is evaluated on genuinely future data. This is temporal discipline.
Re-fit the model on the new training set. Re-evaluate on the new test set.
The R-squared will drop. That is not the model getting worse. That is the model being honestly evaluated for the first time.
Step 7: Understand the difference
Compare the two results side by side: random split versus temporal split.
The random split gave fake accuracy because the model could peek at future patterns. The temporal split gives honest accuracy because the model predicts data it could not have learned from. AI defaults to random splitting on every dataset. On time-dependent data, that default produces systematically wrong evaluations.
This is the prediction equivalent of P1's wrong denominator. Both produce plausible-looking numbers that are systematically off. Both are caught by checking against the verification targets.
Step 8: Commit
Commit the notebook with a message describing the temporal split and why it matters. Reference the verification target that caught the leakage.
Check: The model evaluated on a temporal split should show R-squared in the verification target range (lower than the random split). The random split R-squared should be noticeably higher (the leakage signal).