Learn by Directing AI

Step 1: Break it down by day and time

You have an overall no-show rate. Wanjiku does not need an overall number — she needs to know which days and which time slots are worst, so she can decide where to double-book and where to leave the schedule alone.

Open materials/analysis-specification.md and read the section on breakdowns. The specification asks for no-show rates by day of week and by time slot, each with confidence intervals.

Direct Claude to compute no-show rates for each day of the week and each time slot. Make sure you ask for confidence intervals — the same rule applies here as it did with the overall rate. A breakdown without intervals is a list of numbers that might all be noise.

Check the results against materials/verification-targets.md. Look at the day-of-week breakdown first. Are there days that stand out? Then look at time slots. Do the patterns match what you would expect from a small veterinary clinic — mornings versus afternoons versus evenings?

Step 2: Break it down by visit type

Wanjiku has a specific suspicion: vaccination follow-ups are the worst for no-shows. She has watched it happen for months. Now you can check whether the data agrees.

Direct Claude to compute no-show rates by visit type, with confidence intervals. The visit types in the data are Consultation, Vaccination, Dental, and Surgery.

Look at the results. Vaccination should stand out — the highest no-show rate of the four categories. The confidence intervals matter here: if the intervals for two categories overlap heavily, the difference might be noise. If they do not overlap, the difference is likely real. This chart is the kind of evidence Wanjiku can show her staff — clear, visual, honest about the uncertainty.

Step 3: What a hypothesis test does

You can see the differences in the chart. Vaccination is higher, Surgery is lower. But "looks different" is not the same as "is different." Small datasets produce patterns that look dramatic and mean nothing. With 8,000 rows, you have enough data that real patterns should be detectable — but you still need a formal way to check.

A hypothesis test answers a specific question: could this pattern have appeared by chance? Imagine flipping a coin 100 times and getting 70 heads. A chi-square test asks whether that result is too far from 50/50 to be explained by chance alone. It compares what you observed against what you would expect if there were no real pattern. Here, the chi-square test compares the distribution of no-shows across visit types against what you would expect if visit type had no relationship with no-shows at all. If the observed pattern is far enough from the expected pattern, the test returns a small p-value — which means the pattern is unlikely to be random noise.

Read the section on the chi-square test in materials/analysis-specification.md. The specification tells you what to test and what to check in the output.

Step 4: Run the chi-square test

Direct Claude to run a chi-square test on visit type versus appointment status. Ask for the test statistic, the p-value, and an interpretation.

Check the p-value against the verification target in materials/verification-targets.md. The target says the chi-square test on visit type should be significant — meaning a p-value below 0.05.

Now read Claude's interpretation carefully. This is a place where AI commonly gets the narrative wrong. If Claude says something like "a p-value of 0.03 means there is a 3% chance the null hypothesis is true," that is incorrect. What the p-value actually means: if there were truly no relationship between visit type and no-shows, the pattern you observed would only appear about 3% of the time. The difference is subtle but real. The p-value is about the data given the hypothesis, not the hypothesis given the data.

Ask Claude to check its own interpretation. Be specific — "check whether your p-value interpretation is correct." Vague requests like "does this look right?" tend to produce agreement rather than genuine review. Specific checks work. Vague checks do not.

Step 5: Check the interpretation

Look at the full output now. You have no-show rates by category, confidence intervals, and a hypothesis test. Together, these answer a clear question: does the type of appointment affect whether the client shows up?

The answer should be yes. Vaccination follow-ups have the highest no-show rate, consistent with what Wanjiku suspected. The chi-square test confirms that the differences across visit types are statistically significant — not random variation.

But notice what the test does not tell you. It says the pattern is real. It does not say why. Maybe vaccination follow-ups feel less urgent to pet owners — the pet already got the shot, so the follow-up seems optional. Maybe the clinic's reminder system handles vaccination appointments differently. The data shows the pattern. The explanation comes from Wanjiku's knowledge of her practice.

This is a useful distinction to hold onto. Description and inference are different kinds of questions. "What is the no-show rate by visit type?" is description. "Are the differences real?" is inference. "Why do vaccination follow-ups have the highest rate?" is a question the data cannot answer on its own.

Step 6: A pattern Wanjiku did not ask about

The data dictionary includes a column called client_tenure — whether the client is new or returning. Wanjiku asked about days, times, and visit types. She did not ask about client tenure. But the column is there, and it is worth checking.

Direct Claude to compute no-show rates by client tenure, with confidence intervals.

Look at the results. If new clients have a meaningfully different no-show rate than returning clients, that is information Wanjiku does not currently have. She built her questions around the variables she thinks about daily — the schedule, the appointment types. Client tenure is something she might not have considered as a factor in no-shows.

This is a small example of something that matters more as projects get more complex: the client knows their business, but the data sometimes reveals patterns they were not looking for. Part of the work is noticing when the data has something to say that the client did not think to ask about. You are not overriding Wanjiku's judgment — you are adding to her picture.

When you present findings later, the client tenure breakdown will be worth including. Not as a recommendation — you do not have enough information to tell Wanjiku what to do about it — but as something she should know about.

✓ Check