Learn by Directing AI
All materials

CLAUDE.md

P1 Descriptive Analysis — Muthoni Veterinary Clinic

Client

Wanjiku Muthoni, Owner and Head Veterinarian at Muthoni Veterinary Clinic in Nairobi's Kilimani neighbourhood. Small animal practice seeing 25-30 pets daily by appointment. Staff of 6.

What you are building

A descriptive analysis of 18 months of appointment data to answer four questions for Wanjiku's upcoming staff meeting:

  1. What is the actual no-show rate?
  2. What are the patterns by day of week, time slot, visit type, and client tenure?
  3. Is the no-show problem getting worse over time?
  4. A findings report Wanjiku can present to her team.

The deliverable is a Jupyter notebook containing all analysis and a markdown findings report.

Tech stack

  • Python 3.11+ (conda environment: ds)
  • Jupyter Notebook
  • pandas
  • matplotlib / seaborn
  • scipy (chi-square test)
  • Git / GitHub

File structure

materials/
  CLAUDE.md              ← This file. Project context.
  client-email.md        ← Wanjiku's initial email (read first)
  data-dictionary.md     ← Column definitions and allowed values
  appointments.csv       ← 18 months of appointment records (~8,000 rows)
  analysis-specification.md  ← What to compute and how
  verification-targets.md    ← Expected values to check AI output against
  report-template.md     ← Findings report structure for the staff meeting

The student creates:

  • A Jupyter notebook with the analysis
  • A findings report (from the template)
  • A decision record documenting one analytical choice

Key material references

  • data-dictionary.md — The column contract. Verify the dataset matches this before computing anything.
  • analysis-specification.md — What to compute: overall rate with CI, breakdowns with CIs, chi-square test, temporal trend.
  • verification-targets.md — Expected values to check AI output against. Every computed number should be checked here.
  • report-template.md — Structure for the findings report. Confidence intervals belong in the executive summary, not buried in details.

Ticket list

  • T1: Project setup and data loading. Download materials, read the client email, open the dataset, verify it matches the data dictionary.
  • T2: Data profiling (focused). Summary statistics, missing values, distributions of key variables. Directed profiling, not undirected EDA.
  • T3: Overall no-show rate with 95% CI on the correct denominator. Exclude advance cancellations. Check against verification target.
  • T4: Breakdowns by day of week, time slot, visit type, and client tenure — each with CIs. Chi-square test on visit type vs appointment status.
  • T5: Monthly no-show rate trend over 18 months. Ensure notebook reproducibility (restart and run all).
  • T6: Draft findings report using the template. Verify every number against the notebook. AI self-review with specific checks.
  • T7: Deliver findings to Wanjiku, receive feedback, address any requests. Write decision record. Commit and push to GitHub.

Verification targets

See verification-targets.md for all expected values. Key checks:

  • Overall no-show rate should be in the low teens (12-15%) on the correct denominator
  • Wrong denominator (including cancellations) produces a noticeably lower rate (9-11%)
  • Vaccination follow-ups should have the highest no-show rate among visit types
  • Chi-square on visit type should be significant (p < 0.05)
  • New clients should have a higher no-show rate than returning clients
  • Temporal trend should be relatively stable

Commit convention

Meaningful messages describing what was done and verified. Examples:

  • "Add no-show rate computation — verified against target, correct denominator"
  • "Add category breakdowns with CIs — vaccination follow-ups highest as expected"
  • "Draft findings report — all numbers verified against notebook output"