Baan Suan Hotels -- Multi-Source Property Performance Analysis

Project

You are analyzing guest experience and property performance for Somchai Rattanapong, Director of Operations at Baan Suan Hotels. Baan Suan operates five boutique properties across Thailand: Koh Samui (beach resort), Krabi (beach resort), Bangkok (city hotel), Chiang Mai (cultural property), and Khao Yai (nature retreat).

What you are building

A findings report for the board that combines three data sources (bookings, reviews, revenue), determines which property differences in guest satisfaction are real vs noise, identifies what drives satisfaction, and presents the results in board-ready language.

Tech stack

Python 3.11+ (conda "ds" environment)
Jupyter Notebook
pandas
DuckDB (SQL-based multi-source analysis)
scikit-learn (Ridge, Lasso, cross_val_score)
scipy (statistical tests, assumption checking)
matplotlib / seaborn
Claude Code
Git / GitHub

Data sources

materials/bookings.csv -- reservation data (dates in DD/MM/YYYY, ~5,400 rows)
materials/reviews.csv -- guest satisfaction scores from TripAdvisor and Booking.com (dates in MM/DD/YYYY, ~3,200 rows)
materials/revenue.csv -- monthly revenue reports per property (90 rows)
Data dictionaries: materials/bookings-dictionary.md, materials/reviews-dictionary.md, materials/revenue-dictionary.md

Key materials

materials/client-email.md -- Somchai's initial email with requirements
materials/methodology-memo-template.md -- template the student fills progressively

Tickets

T1: Profile all three datasets independently (shape, types, nulls, date formats)
T2: Join sources -- resolve date format discrepancy (DD/MM vs MM/DD), standardize revenue categories, verify row counts
T3: Inferential analysis -- assumption checking, ANOVA or Kruskal-Wallis, effect sizes, pairwise comparisons with correction, interaction effects
T4: Prediction model -- feature selection, regularization (Ridge/Lasso), cross-validation with std, feature importance
T5: Cross-model review -- second AI reviews methodology memo with fresh context
T6: Board findings report -- translate statistical results to business language, address all four requirements
T7: Deliver to Somchai, handle scope extension, write decision record, push to GitHub

Verification guidance

Check row counts before and after every join
Check date formats match across sources before joining
Run assumption checks (normality, homoscedasticity) before statistical tests
Report effect sizes alongside p-values
Apply multiple comparison corrections for pairwise tests
Report cross-validation mean AND standard deviation
No leaky features in prediction model

Commit convention

Commit after completing each ticket. Use descriptive messages that explain what analytical work was done, not just what files changed.

CLAUDE.md