Learn by Directing AI
Unit 1

Wei's email and the booking data

Step 1: Wei's email

Wei Liang is Director of Marketing at BrightSmile Dental, six dental clinics in Chengdu. He ran a marketing campaign -- WeChat ads, KOL partnerships, a referral bonus program -- that cost CNY 180,000. New patient bookings increased 22% during the campaign period.

Open the chat with Wei. Read his email. Five things he needs:

  1. Whether the 22% increase is real or just seasonal -- bookings always spike before Chinese New Year
  2. The seasonal effect separated from the campaign effect
  3. The new clinic's numbers excluded from the campaign impact
  4. A breakdown by channel -- which ones actually worked
  5. A board-ready number: "the campaign generated X additional patients at CNY Y per patient"

Wei is formal, direct, and busy. He writes structured emails with numbered points and expects clear answers. The board meets in three weeks.

Step 2: Project setup

Open a terminal and start Claude Code:

cd ~/dev
claude

Paste this prompt:

Create the folder ~/dev/analytics/p6. Download the project materials from https://learnbydirectingai.dev/materials/analytics/p6/materials.zip and extract them into that folder. Read CLAUDE.md -- it's the project governance file. If anything needs admin access, tell me what to run in a separate terminal.

Claude creates the folder, downloads the materials, and reads CLAUDE.md. That file describes Wei's situation, the deliverables, the tech stack, and the work breakdown. Once Claude confirms it has read CLAUDE.md, you are set up.

Confirm Docker and Metabase are still running from P5. If they are not, direct Claude to restart them.

Step 3: The data dictionary

Open materials/data-dictionary.md. It describes the booking data columns: date, clinic_id, patient_type, service_category, booking_source, revenue.

Six clinics across Chengdu: Jinniu, Wuhou, Chenghua, Qingyang, Shuangliu, Gaoxin. Two patient types: new and returning. Four service categories: general, cosmetic, orthodontics, implant. Five booking sources: walk-in, online_booking, referral, wechat_ad, kol.

Note the business terminology. A "new patient" is someone visiting any BrightSmile clinic for the first time. The booking_source field tracks how the patient was acquired -- walk-in, online booking, referral, or through one of the campaign channels.

Step 4: The campaign calendar

Open materials/campaign-calendar.md. Three channels ran during the campaign:

  • WeChat ads: October 1 through December 31 (CNY 80,000)
  • KOL partnerships: October 15 through December 15 (CNY 60,000)
  • Referral bonus: October 1 through December 31 (CNY 40,000)

Total spend: CNY 180,000. The referral bonus was CNY 200 per referred patient who completed a first appointment. The WeChat ads directed patients to a booking link. KOL partners posted sponsored content with unique booking codes.

Note which channels started when. The KOL channel started two weeks later than the other two. This matters for timing analysis.

Step 5: Data profiling

The booking data (materials/booking-data.csv) covers two years of daily bookings across all six clinics. Direct AI to load it into DuckDB and profile it. Be specific -- one thing at a time:

Load materials/booking-data.csv into DuckDB. Show me the column names, types, and row count.

You should see approximately 28,000 rows with six columns matching the data dictionary. Check the date range -- it should span two full calendar years.

Set up the session context now. Tell AI what this project is about, what the key metric definitions are, and what constraints matter. Context curation from the start prevents AI from inventing its own definitions later.

Step 6: Clinic distribution

Direct AI to show the row counts and earliest dates for each clinic:

Group by clinic_id and show the count and MIN(date) for each clinic. Order by MIN(date).

Five clinics should have data starting from the beginning of the dataset. One -- Gaoxin -- should have a later start date, somewhere around October of the second year. That means Gaoxin opened during the campaign period.

Every patient at a new clinic is classified as "new" because nobody has visited before. If Gaoxin is included in the campaign analysis, its new-patient numbers inflate the campaign's apparent effect. This is not a data quality error -- the data is correct. It is a confounding factor: an alternative explanation for the increase in new patients.

Step 7: Attribution review

Look at the booking_source distribution. Direct AI to count bookings by source across the full dataset:

Count bookings by booking_source, including nulls. Show the counts and percentages.

The wechat_ad and kol sources should only appear during the campaign period (October through December of the second year). Before the campaign, those channels did not exist. The referral source appears throughout the dataset -- both organic referrals and bonus-driven referrals share the same tag.

Notice the null booking_source values. About 3% of bookings have no source recorded -- primarily walk-in patients where the front desk did not capture the information. These are not errors, but they represent bookings where the attribution is unknown.

Consider what this means for the analysis. If a patient saw a WeChat ad but booked by phone instead of using the link, they are tagged as "walk-in" or "online_booking" -- not as "wechat_ad." The campaign channel data understates the true campaign effect. Keep this in mind when interpreting the statistical tests.

✓ Check

Check: You should have approximately 28,000 rows across six clinics with two years of daily data. The Gaoxin clinic should have data starting partway through the dataset (approximately at the campaign start date), and the booking_source column should show five categories.