Baan Suan Hotels -- Multi-Source Property Performance Analysis
Project
You are analyzing guest experience and property performance for Somchai Rattanapong, Director of Operations at Baan Suan Hotels. Baan Suan operates five boutique properties across Thailand: Koh Samui (beach resort), Krabi (beach resort), Bangkok (city hotel), Chiang Mai (cultural property), and Khao Yai (nature retreat).
What you are building
A findings report for the board that combines three data sources (bookings, reviews, revenue), determines which property differences in guest satisfaction are real vs noise, identifies what drives satisfaction, and presents the results in board-ready language.
Tech stack
- Python 3.11+ (conda "ds" environment)
- Jupyter Notebook
- pandas
- DuckDB (SQL-based multi-source analysis)
- scikit-learn (Ridge, Lasso, cross_val_score)
- scipy (statistical tests, assumption checking)
- matplotlib / seaborn
- Claude Code
- Git / GitHub
Data sources
- materials/bookings.csv -- reservation data (dates in DD/MM/YYYY, ~5,400 rows)
- materials/reviews.csv -- guest satisfaction scores from TripAdvisor and Booking.com (dates in MM/DD/YYYY, ~3,200 rows)
- materials/revenue.csv -- monthly revenue reports per property (90 rows)
- Data dictionaries: materials/bookings-dictionary.md, materials/reviews-dictionary.md, materials/revenue-dictionary.md
Key materials
- materials/client-email.md -- Somchai's initial email with requirements
- materials/methodology-memo-template.md -- template the student fills progressively
Tickets
- T1: Profile all three datasets independently (shape, types, nulls, date formats)
- T2: Join sources -- resolve date format discrepancy (DD/MM vs MM/DD), standardize revenue categories, verify row counts
- T3: Inferential analysis -- assumption checking, ANOVA or Kruskal-Wallis, effect sizes, pairwise comparisons with correction, interaction effects
- T4: Prediction model -- feature selection, regularization (Ridge/Lasso), cross-validation with std, feature importance
- T5: Cross-model review -- second AI reviews methodology memo with fresh context
- T6: Board findings report -- translate statistical results to business language, address all four requirements
- T7: Deliver to Somchai, handle scope extension, write decision record, push to GitHub
Verification guidance
- Check row counts before and after every join
- Check date formats match across sources before joining
- Run assumption checks (normality, homoscedasticity) before statistical tests
- Report effect sizes alongside p-values
- Apply multiple comparison corrections for pairwise tests
- Report cross-validation mean AND standard deviation
- No leaky features in prediction model
Commit convention
Commit after completing each ticket. Use descriptive messages that explain what analytical work was done, not just what files changed.