Learn by Directing AI

Step 1: The statistical testing guide

Open materials/statistical-testing-guide.md. This is the core reference for the project. Read the introduction to hypothesis testing.

A hypothesis test asks: could this pattern be noise? Wei sees a 22% increase in new patient bookings. Monthly bookings vary naturally -- some months are higher, some lower, even without a campaign. The test tells you whether the observed increase is bigger than what normal variation would produce.

Every hypothesis test has two competing explanations:

Null hypothesis: The campaign had no effect. The increase is consistent with normal seasonal variation.
Alternative hypothesis: The campaign had an effect. The increase is larger than seasonal variation alone would produce.

The test computes a p-value -- the probability of seeing an increase at least this large if the null hypothesis were true. A small p-value means the data is hard to explain without a campaign effect. A large p-value means the data is consistent with normal variation.

The conventional threshold is alpha = 0.05. Below 0.05, the result is called "statistically significant." But this threshold is a convention, not a law. A p-value of 0.04 and a p-value of 0.06 represent similar evidence. The threshold helps standardize reporting, not thinking.

Step 2: Framing the primary hypothesis

Wei's question is "did the campaign work?" That is a business question, not a testable hypothesis. To test it statistically, you need to reframe it.

The testable version: "Is the proportion of new patient bookings during the campaign period significantly higher than the same proportion during the equivalent period in the prior year, excluding the Gaoxin clinic?"

Write this as:

H0 (null): The new patient booking rate during Oct-Dec of the campaign year equals the rate during Oct-Dec of the prior year (excluding Gaoxin). Any difference is due to chance.
H1 (alternative): The new patient booking rate during the campaign period is significantly different from the prior year's rate.

The framing determines what kind of answer is possible. "Did the campaign work?" gets you a comparison chart. "Is the increase statistically significant after controlling for seasonality?" gets you a p-value and a confidence interval -- numbers that hold up at the board meeting.

Step 3: Defining the baseline

The seasonal baseline is the same period in the prior year: October through December. This controls for the Chinese New Year and school holiday effect that Wei already knows about.

But the baseline must exclude Gaoxin. That clinic did not exist during the prior year's Q4, and in the campaign year every patient there is classified as new. Including Gaoxin would inflate the campaign-period new-patient rate for reasons unrelated to the campaign.

Direct AI to compute both rates:

Calculate the new patient booking rate (new bookings / total bookings) for Oct-Dec of year 1 across the five original clinics. Then calculate the same rate for Oct-Dec of year 2, excluding Gaoxin. Show both rates.

Step 4: Test selection

Read the test selection section of the guide. The outcome variable is a proportion: what fraction of bookings during each period are from new patients. A booking is either from a new patient or it is not. That is binary data.

The guide specifies: use a z-test for proportions when comparing rates between two groups. Do not use a t-test. A t-test is for continuous data -- revenue amounts, booking counts. Proportions are binary: the outcome for each observation is yes or no.

AI commonly applies a t-test whenever it sees "compare two groups" -- regardless of the data type. If AI proposes a t-test for this analysis, catch it. The wrong test produces a different p-value, and the conclusion can flip.

Step 5: Constraining AI's analysis plan

Direct AI to frame the analysis plan. Watch what test it proposes:

Frame an analysis plan to test whether the marketing campaign significantly increased new patient bookings. Compare Oct-Dec year 2 (excluding Gaoxin) to Oct-Dec year 1 as the seasonal baseline.

AI may default to a t-test. If it does, specify the constraint:

Use a z-test for proportions (proportions_ztest from statsmodels), not a t-test. The outcome is a proportion -- new patient bookings as a fraction of total bookings. That is binary data, not continuous.

This is constraint specification -- telling AI what test to use and why before any computation runs. Constraints shape AI's output more effectively than corrections after the fact.

Step 6: Secondary hypotheses

Frame two additional hypotheses for the channel and category analyses:

Channel analysis: Does the campaign effect differ by channel? Each channel (WeChat ads, KOL, referral) can be tested separately -- is the proportion of bookings from that source significantly higher than expected?

Category analysis: Is the campaign effect associated with specific service categories? This is a question about whether two categorical variables (campaign/no-campaign and service category) are related. The chi-squared test is the right choice here.

Step 7: Ask Wei about Gaoxin

You noticed from the profiling that Gaoxin's data starts mid-dataset. Ask Wei about it. Open the chat and ask about location-level differences:

I noticed that one of your six clinics -- the Gaoxin location -- has data starting in October of last year. Can you tell me about that clinic? Did it open recently?

Wei confirms: "Yes, Gaoxin opened October 15. All patients there are new -- it was a brand new clinic. I should have mentioned that. Don't include Gaoxin in the campaign numbers."

This confirms the exclusion. Your profiling finding was correct, and the analytical approach is validated by the client.

✓ Check

Check: You should have a null and alternative hypothesis for the primary question (campaign effect on new patient bookings), with the z-test for proportions selected. The Gaoxin clinic should be excluded from the campaign analysis.