Analysis Specification — Appointment No-Show Patterns

This document specifies the analytical tasks for the Muthoni Veterinary Clinic appointment data. Each section describes what to compute and what the result should include. These are analytical requirements, not code instructions — direct AI to produce each result and verify the output.

1. Overall No-Show Rate

Compute the overall no-show rate as a percentage with a 95% confidence interval.

Denominator: The rate should be calculated on scheduled appointments only — that is, appointments where the client either showed up or did not show up. Advance cancellations are a different phenomenon (the client actively cancelled) and should be excluded from the denominator. The no-show rate answers: "of the people who were expected to show up, how many didn't?"

What to produce:

The no-show rate as a percentage
A 95% confidence interval for the rate (Wilson or normal approximation — either is acceptable at this sample size)
The numerator and denominator used in the calculation (so the computation can be verified)

2. Breakdowns by Category

Compute the no-show rate broken down by each of the following variables. Each breakdown should include the rate and a 95% confidence interval for each category.

Variables to break down by:

day_of_week — Is the no-show rate worse on certain days? Report the rate for each day (Monday through Saturday).
time_slot — Is the no-show rate worse at certain times? Report the rate for Morning, Afternoon, and Evening.
visit_type — Is the no-show rate worse for certain appointment types? Report the rate for Consultation, Vaccination, Dental, and Surgery. Wanjiku suspects vaccination follow-ups are worst.
client_tenure — Do new clients and returning clients no-show at different rates? Report the rate for New and Returning.

Denominator for all breakdowns: Same as Section 1 — exclude advance cancellations. Compute the rate within each category using only Show and No-show records for that group.

3. Hypothesis Testing — Visit Type

Run a chi-square test of independence to determine whether the distribution of appointment outcomes (Show vs No-show) differs across visit types.

What the test checks: Whether the relationship between visit type and no-show behavior is statistically significant — that is, whether the differences in no-show rates across visit types are unlikely to have occurred by chance alone.

What to report:

The chi-square test statistic
The p-value
The interpretation: what the result means in plain language for someone who is not a statistician

What to watch for: AI may misinterpret the p-value. A p-value of 0.03 does not mean "there is a 3% chance the null hypothesis is true." It means: "if there were truly no relationship between visit type and no-shows, we would observe a pattern this extreme only 3% of the time." The distinction matters.

Scope: Run the chi-square test on visit_type versus appointment_status (Show/No-show only — exclude Cancelled records from this test, consistent with the denominator decision above).

4. Temporal Trend

Compute the monthly no-show rate over the full 18-month period and plot the trend.

What to produce:

A table or series showing the no-show rate for each month
A line chart showing the monthly rate over time
A brief assessment: is the rate increasing, decreasing, or relatively stable? Are there any notable spikes or dips?

Denominator: Same as above — exclude cancellations. Compute the monthly rate using only Show and No-show records for each month.

What to look for: Wanjiku wants to know whether the problem is getting worse. The trend chart should answer this visually. Month-to-month variation is normal — the question is whether there is a sustained upward or downward trajectory.

analysis-specification.md

Analysis Specification — Appointment No-Show Patterns

1. Overall No-Show Rate

2. Breakdowns by Category

3. Hypothesis Testing — Visit Type

4. Temporal Trend