P1 Descriptive Analysis — Muthoni Veterinary Clinic

Client

Wanjiku Muthoni, Owner and Head Veterinarian at Muthoni Veterinary Clinic in Nairobi's Kilimani neighbourhood. Small animal practice seeing 25-30 pets daily by appointment. Staff of 6.

What you are building

A descriptive analysis of 18 months of appointment data to answer four questions for Wanjiku's upcoming staff meeting:

What is the actual no-show rate?
What are the patterns by day of week, time slot, visit type, and client tenure?
Is the no-show problem getting worse over time?
A findings report Wanjiku can present to her team.

The deliverable is a Jupyter notebook containing all analysis and a markdown findings report.

Tech stack

Python 3.11+ (conda environment: ds)
Jupyter Notebook
pandas
matplotlib / seaborn
scipy (chi-square test)
Git / GitHub

File structure

materials/
  CLAUDE.md              ← This file. Project context.
  client-email.md        ← Wanjiku's initial email (read first)
  data-dictionary.md     ← Column definitions and allowed values
  appointments.csv       ← 18 months of appointment records (~8,000 rows)
  analysis-specification.md  ← What to compute and how
  verification-targets.md    ← Expected values to check AI output against
  report-template.md     ← Findings report structure for the staff meeting

The student creates:

A Jupyter notebook with the analysis
A findings report (from the template)
A decision record documenting one analytical choice

Key material references

data-dictionary.md — The column contract. Verify the dataset matches this before computing anything.
analysis-specification.md — What to compute: overall rate with CI, breakdowns with CIs, chi-square test, temporal trend.
verification-targets.md — Expected values to check AI output against. Every computed number should be checked here.
report-template.md — Structure for the findings report. Confidence intervals belong in the executive summary, not buried in details.

Ticket list

T1: Project setup and data loading. Download materials, read the client email, open the dataset, verify it matches the data dictionary.
T2: Data profiling (focused). Summary statistics, missing values, distributions of key variables. Directed profiling, not undirected EDA.
T3: Overall no-show rate with 95% CI on the correct denominator. Exclude advance cancellations. Check against verification target.
T4: Breakdowns by day of week, time slot, visit type, and client tenure — each with CIs. Chi-square test on visit type vs appointment status.
T5: Monthly no-show rate trend over 18 months. Ensure notebook reproducibility (restart and run all).
T6: Draft findings report using the template. Verify every number against the notebook. AI self-review with specific checks.
T7: Deliver findings to Wanjiku, receive feedback, address any requests. Write decision record. Commit and push to GitHub.

Verification targets

See verification-targets.md for all expected values. Key checks:

Overall no-show rate should be in the low teens (12-15%) on the correct denominator
Wrong denominator (including cancellations) produces a noticeably lower rate (9-11%)
Vaccination follow-ups should have the highest no-show rate among visit types
Chi-square on visit type should be significant (p < 0.05)
New clients should have a higher no-show rate than returning clients
Temporal trend should be relatively stable

Commit convention

Meaningful messages describing what was done and verified. Examples:

"Add no-show rate computation — verified against target, correct denominator"
"Add category breakdowns with CIs — vaccination follow-ups highest as expected"
"Draft findings report — all numbers verified against notebook output"