P1 Descriptive Analysis — Muthoni Veterinary Clinic
Client
Wanjiku Muthoni, Owner and Head Veterinarian at Muthoni Veterinary Clinic in Nairobi's Kilimani neighbourhood. Small animal practice seeing 25-30 pets daily by appointment. Staff of 6.
What you are building
A descriptive analysis of 18 months of appointment data to answer four questions for Wanjiku's upcoming staff meeting:
- What is the actual no-show rate?
- What are the patterns by day of week, time slot, visit type, and client tenure?
- Is the no-show problem getting worse over time?
- A findings report Wanjiku can present to her team.
The deliverable is a Jupyter notebook containing all analysis and a markdown findings report.
Tech stack
- Python 3.11+ (conda environment:
ds) - Jupyter Notebook
- pandas
- matplotlib / seaborn
- scipy (chi-square test)
- Git / GitHub
File structure
materials/
CLAUDE.md ← This file. Project context.
client-email.md ← Wanjiku's initial email (read first)
data-dictionary.md ← Column definitions and allowed values
appointments.csv ← 18 months of appointment records (~8,000 rows)
analysis-specification.md ← What to compute and how
verification-targets.md ← Expected values to check AI output against
report-template.md ← Findings report structure for the staff meeting
The student creates:
- A Jupyter notebook with the analysis
- A findings report (from the template)
- A decision record documenting one analytical choice
Key material references
- data-dictionary.md — The column contract. Verify the dataset matches this before computing anything.
- analysis-specification.md — What to compute: overall rate with CI, breakdowns with CIs, chi-square test, temporal trend.
- verification-targets.md — Expected values to check AI output against. Every computed number should be checked here.
- report-template.md — Structure for the findings report. Confidence intervals belong in the executive summary, not buried in details.
Ticket list
- T1: Project setup and data loading. Download materials, read the client email, open the dataset, verify it matches the data dictionary.
- T2: Data profiling (focused). Summary statistics, missing values, distributions of key variables. Directed profiling, not undirected EDA.
- T3: Overall no-show rate with 95% CI on the correct denominator. Exclude advance cancellations. Check against verification target.
- T4: Breakdowns by day of week, time slot, visit type, and client tenure — each with CIs. Chi-square test on visit type vs appointment status.
- T5: Monthly no-show rate trend over 18 months. Ensure notebook reproducibility (restart and run all).
- T6: Draft findings report using the template. Verify every number against the notebook. AI self-review with specific checks.
- T7: Deliver findings to Wanjiku, receive feedback, address any requests. Write decision record. Commit and push to GitHub.
Verification targets
See verification-targets.md for all expected values. Key checks:
- Overall no-show rate should be in the low teens (12-15%) on the correct denominator
- Wrong denominator (including cancellations) produces a noticeably lower rate (9-11%)
- Vaccination follow-ups should have the highest no-show rate among visit types
- Chi-square on visit type should be significant (p < 0.05)
- New clients should have a higher no-show rate than returning clients
- Temporal trend should be relatively stable
Commit convention
Meaningful messages describing what was done and verified. Examples:
- "Add no-show rate computation — verified against target, correct denominator"
- "Add category breakdowns with CIs — vaccination follow-ups highest as expected"
- "Draft findings report — all numbers verified against notebook output"