Step 1: Structure the report
Open materials/statistical-testing-template.md. This is the reporting format for experiment results. Six sections: test setup, metric definition, results summary, detailed results, confounds and limitations, and recommendation.
Direct AI to help you fill in each section using the analysis from Units 2-5. But verify every number AI puts in the template against your own results. The template is the structure; the content is yours.
Start with the test setup and metric definition -- these should come directly from your Unit 2 documentation. The metric definition section should be precise enough that someone else could reproduce the exact same test from the same data and get the same numbers.
Step 2: Visualise the findings
Create the chart that will go in Marco's report. This is the visual that communicates the full story: conversion rates by tour type and page version, with confidence interval error bars.
Create a grouped bar chart showing conversion rates for page A and page B across four groups: Death Road, Premium Treks (combined), Paragliding, and Overall. Add 95% confidence interval error bars to each bar. Use clear labels and a legend. Title: "A/B Test Results by Tour Type."
The chart must show uncertainty. Bar heights without error bars communicate false precision -- "Death Road conversion is 23% on the new page" sounds definitive. With error bars, it becomes "Death Road conversion is between 19% and 27% on the new page." Marco can see where the ranges overlap (paragliding -- no clear winner) and where they don't (Death Road -- new page is clearly better; premium treks -- old page is clearly better).
Showing only the point estimate when there is statistical uncertainty is dishonest. The error bars are not decorations.
Step 3: Write the recommendation
The honest answer is nuanced. Not "the new page is better" and not "the new page is worse." The data supports a specific, conditional recommendation:
- Keep the new page for standard tours (Death Road). The booking rate improvement is statistically significant and practically meaningful. The effect holds even when you account for the ad budget confound.
- Revert or modify the premium trek pricing display. The premium trek decline is likely caused by the pricing display change (showing individual prices without visible group discounts), not the overall page design. Test a modified version that shows group pricing for premium treks while keeping the new layout.
- Don't change ad budgets during future tests. The traffic composition shift made the results harder to interpret.
- Run a separate test for Spanish and French pages. The current results apply only to English-language visitors.
Direct AI to draft the recommendation, then review it. Check that AI communicates the nuance rather than collapsing to a simple verdict. If AI writes "the new page is better overall" without the per-tour-type breakdown and confound analysis, the recommendation misleads Marco.
Step 4: Future test advice
Marco asked for help running better tests in the future. The analysis surfaced specific lessons:
- Define the success metric before the test starts, not after
- Don't change advertising budgets or traffic sources mid-test
- Test one change at a time -- the pricing display and page layout were changed simultaneously
- Include all visitor segments (languages) or document which segments are excluded
- Estimate the sample size needed before running the test, especially for subgroups
These are not abstract principles. Each one maps to a specific problem in this test that made the results harder to interpret. Marco understands them because he lived through the consequences.
Step 5: Send the report to Marco
Send the completed experiment report to Marco. He responds warmly -- relieved to finally understand what happened.
Marco's reaction: "Madre mia, this makes so much more sense now. So the new page IS better for Death Road tours but the pricing change is what killed the premium treks? That makes total sense -- Camila was right but for the wrong reason."
He asks a follow-up: "Can we also look at which tour packages are most popular by season? The high season is coming and I need to decide which tours to promote."
This is scope creep -- reasonable but separate. The seasonal analysis is a different project with different data needs. You can redirect: "That's a great question, but it needs different data -- booking patterns across multiple years, not just this 60-day experiment. Let me finish documenting this analysis and we can scope the seasonal work separately."
Marco won't push hard. He respects the boundary.
Step 6: Commit and push
Commit your analysis to Git with a descriptive message:
git add -A
git commit -m "Complete A/B test analysis for Cumbre Adventures: overall positive effect for new booking page on standard tours, negative effect on premium treks due to pricing display, ad budget confound identified and accounted for"
git push origin main
The project is complete. You've delivered an experiment report that honestly communicates what the data can and cannot prove: the new page improves standard tour bookings, the premium trek decline is a pricing display problem with a specific fix, and the ad budget confound inflated the apparent effect. Marco has an actionable recommendation and a framework for running better tests.
✓ Check: The report should include: test setup, metric definition, p-value with CI for overall and per-tour-type, confound assessment, clear recommendation with reasoning, and forward-looking suggestions for future tests.