Verification Targets
Check AI's output against these targets at each stage. If a result falls outside the expected range, investigate before proceeding.
Data preparation
-
Missing values: The extended dataset should have 3-5% missing values in the time_slot and client_tenure columns, concentrated in the last 3 months only. The original 18 months should have no missing values. If AI reports missing values across the full dataset, check whether the missing data mechanism is being investigated or just dropped.
Common AI error: AI drops all rows with missing values without checking whether the missingness is informative or concentrated in a specific time period.
-
Row count after cleaning: After handling missing values, the dataset should retain at least 95% of its original rows (~9,000+). If significantly fewer rows remain, AI may be over-aggressively dropping data.
Common AI error: AI applies a blanket drop strategy across all columns, removing far more rows than necessary.
Model building
-
Random-split R-squared (the leakage signal): If you split the data randomly (the default), R-squared will be suspiciously high -- above 0.30. This is NOT a good model. This is data leakage: the random split lets the model "see" future patterns while predicting past values.
Common AI error: AI defaults to
train_test_split(shuffle=True)and reports the resulting metrics as legitimate. -
Temporal-split R-squared (the honest result): Train on the first 15 months, test on the last 6 months. R-squared should be in the 0.10-0.20 range. This is the honest model performance on genuinely unseen future data. It is lower than the random split because the model cannot cheat.
Common AI error: AI uses random splitting and does not flag the temporal nature of the data. The student must enforce temporal discipline.
-
Temporal-split RMSE: The model should be typically off by about 8-12 percentage points on no-show probability (RMSE approximately 0.30-0.38 on the probability scale). This is the number Wanjiku needs: "the model is typically off by about X percentage points."
Common AI error: AI reports RMSE as a raw number without translating it into practical language for the client.
Evaluation
-
Naive baseline R-squared: Predicting the overall no-show rate (the mean) for every appointment should give R-squared near 0. The model's R-squared should be meaningfully higher than this baseline. Without the baseline, you cannot judge whether the model adds value.
Common AI error: AI reports model R-squared without computing or comparing against a naive baseline.
-
Coefficient signs: In the linear regression, these coefficients should be domain-plausible:
- Vaccination visit_type: positive (increases no-show risk)
- Morning time_slot: negative (decreases no-show risk)
- Returning client_tenure: negative (decreases no-show risk)
If any coefficient has the wrong sign or an implausible magnitude, investigate.
Common AI error: AI generates coefficients without checking whether they make domain sense -- optimizing for fit, not for business coherence.