Nile Compass Tours Booking Analysis

Client

Hassan El-Amin, Founder and Managing Director of Nile Compass Tours (Cairo, Egypt). A growing tour operator offering private cultural tours, multi-day Egypt itineraries, and Nile cruise packages. About 3,000 bookings per year.

What you are building

An inferential analysis of booking patterns to determine what factors are associated with booking growth -- specifically whether the shift to digital marketing 18 months ago is associated with increased bookings after controlling for seasonality and exchange rate trends. Supported by descriptive analysis of seasonal patterns for staffing decisions.

Tech stack

Python 3.11+ (conda "ds" environment)
Jupyter Notebook
pandas
statsmodels (OLS regression, hypothesis tests, assumption checks)
scipy (statistical tests, effect size calculations)
scikit-learn (supplementary modeling if needed)
matplotlib / seaborn (visualization)

File structure

p6/
  materials/
    bookings.csv              -- 3 years of booking data (~6,300 rows)
    marketing-spend.csv       -- Monthly marketing spend by channel (144 rows)
    data-dictionary.md        -- Field definitions for both datasets
    methodology-memo-template.md -- Template for documenting analytical decisions
    CLAUDE.md                 -- This file
  analysis.ipynb              -- Main analysis notebook (student creates)
  findings-summary.md         -- Findings for Hassan (student creates)
  technical-appendix.md       -- Methodology for silent partner (student creates)
  methodology-memo.md         -- Completed methodology memo (student creates)
  decision-record.md          -- Key decision documentation (student creates)

Key analytical concepts

Question typology: The brief is ambiguous ("understand our booking patterns"). The student must determine whether this is a descriptive, inferential, predictive, or causal question.
Inference vs prediction: Hassan needs to know whether the marketing shift worked (inference), not how many bookings to expect next quarter (prediction).
Effect sizes: Statistical significance alone is not enough. Effect sizes tell Hassan whether the finding is large enough to act on.
Self-reported attribution: The marketing_channel field is self-reported and systematically biased. This limitation must be documented.
Confounding: Multiple factors changed around the same time (marketing shift, exchange rate, new tour types). The regression controls for measured confounders but cannot prove causation.
Assumption checking: Check regression assumptions (normality, homoscedasticity, multicollinearity) before interpreting coefficients.

Task list

Profile data -- load and profile bookings.csv and marketing-spend.csv
Determine question type -- analyze Hassan's brief, identify the question type, document the framing decision
Clean and describe -- handle cancellations, investigate attribution reliability, run descriptive analysis
Run inferential analysis -- regression with marketing_shift as key predictor, compute effect sizes
Design and run validation -- design validation strategy, meta-prompting, sensitivity analysis, cross-model review
Translate findings -- translate statistical findings into business terms, prepare two deliverables
Deliver and close -- send to Hassan, handle scope extensions, write decision record, commit and push

Verification targets

Question type documented and justified (inference, not prediction)
Cancelled bookings (~15%) separated from confirmed
Self-reported attribution limitation documented
Regression assumptions checked before interpreting coefficients
Effect sizes computed alongside p-values
Cross-model review completed with a separate AI context
All five of Hassan's requirements addressed
Findings stated in business terms, not statistical language
Confounding limitations stated honestly (association, not causation)

Commit convention

Commit after each major analytical milestone with a meaningful message describing what was decided and why. Example: "feat: determine question type as inference, document prediction alternative"