Learn by Directing AI

Step 1: What does "conversion rate" mean?

Before computing any test, define the metric. "Conversion rate" sounds straightforward, but it is not. Does it count every visitor, or only first-time visitors? Does it count "booked any tour" or "booked a specific tour type"? Does a visitor who browses three tours and books one count as one conversion or three opportunities?

Direct AI to compute conversion rates under different definitions:

Using the ab-test-data.csv, compute the overall conversion rate for each page version under two definitions: (1) booking_completed = true / total visitors per page version, and (2) booking_completed = true / unique visitors who viewed at least one tour per page version. Show both results side by side.

The numbers will differ depending on the definition. AI will compute whichever definition it infers from the column names unless you specify. This is why the metric must be defined before the test runs -- "conversion rate" computed one way can tell a different story than "conversion rate" computed another way, from the same data.

Step 2: Define the primary metric

Choose the primary success metric: overall conversion rate, defined as the proportion of visitors who completed a booking (booking_completed = true) out of all visitors who saw that page version.

Document the definition precisely. In a notebook cell or markdown file, write:

Metric name: Overall conversion rate
Numerator: Visitors with booking_completed = true
Denominator: All visitors assigned to that page version
Scope: All visitor sources, all tour types
Excludes: Nothing -- every visitor counts

This definition is the contract. When AI computes the test, it must use this definition. If you change the definition later, the test results change with it. Metric integrity starts with precision.

Step 3: Frame the questions

The data can answer several questions. Not just "is the new page better?" but:

Does the new page have a significantly higher overall conversion rate?
Does the effect differ by tour type -- especially for premium treks (Huayna Potosi and Cordillera Real)?
Does the effect differ by visitor source (organic, paid ads, hostel referrals, agency)?

Frame these as separate tests. Each gets its own row in the statistical testing template. Open materials/statistical-testing-template.md and look at the Detailed Results table -- that is where the answers will go.

The student who asks only question 1 and reports the overall number gives Marco an incomplete picture. Marco's operations manager already knows the overall number is up. What she wants to know is why premium bookings are down.

Step 4: Conversion rates by tour type

Direct AI to compute conversion rates by tour type and page version:

Compute the booking rate for each combination of page_version and tour_selected. Show a table with tour type, page A rate, page B rate, and the difference. Include an "overall" row.

The pattern should be visible. The overall rate is higher for page B. But look at the premium treks -- Huayna Potosi and Cordillera Real. Their booking rates are lower on page B. The overall average is masking an opposite effect in a subgroup. This is a form of Simpson's paradox: the aggregate tells one story while the components tell another.

This is the first time the brief hasn't told you what question to answer. Marco said "look at the numbers properly." You decided to segment by tour type. That decision -- what to investigate -- is now your responsibility.

Step 5: The pricing display

Something is driving the premium trek decline. Ask Marco how prices appear on each page version.

Message Marco about the pricing display. He hadn't thought of it as relevant: the new page shows the full price per person, while the old page showed "from $120 per person" with group discounts visible underneath. His developer said the cleaner layout was better.

This is a confound. Premium treks have tiered pricing based on group size. On the old page, visitors saw the group discount immediately. On the new page, they see only the individual price. A three-day Cordillera Real trek at $350 per person looks expensive when the group discount that brings it to $240 per person is hidden on a sub-page. The pricing display difference -- not the page design itself -- may explain the premium trek decline.

Marco discovered this only because you asked. He would not have mentioned it unprompted.

Step 6: Document before testing

Before running any statistical tests, document what you have:

Primary metric defined: Overall conversion rate (booking_completed = true / total visitors per page version)
Questions framed: Overall test, per-tour-type test, per-source test
Confound identified: Pricing display difference between page versions for premium treks
Hypothesis: The overall effect is real but the premium trek decline is a pricing display artifact, not a page design problem

Write this into a notebook cell or your analysis file. The next unit computes the tests. Having the metric defined and the questions framed before the computation starts means you can verify the results against your expectations -- and catch it if AI uses a different definition than the one you specified.

✓ Check