Glow Republic Demand Forecasting
Client
Eunji Cho, Head of Merchandising Analytics at Glow Republic -- a mid-size Korean beauty retailer in Seoul with 12 physical stores and an e-commerce platform. ~2,000 SKUs from 150 K-beauty brands.
What you are building
A demand forecasting model that predicts weekly demand for Glow Republic's top 200 SKUs using sales history and social media data. The model must separate seasonal products (predictable patterns) from trend-driven products (viral, unpredictable) and provide actionable ordering recommendations for the buying team.
Tech stack
- Python 3.11+ (conda "ds" environment)
- Jupyter Notebook
- pandas (data manipulation)
- scikit-learn (RandomForestRegressor or GradientBoostingRegressor, train_test_split, feature_importances_, MAE, RMSE)
- matplotlib / seaborn (visualization)
- scipy (statistical checks)
- Claude Code (AI direction)
- Git / GitHub (version control)
File structure
p5/
CLAUDE.md <- this file
materials/
sales-data.csv <- 24 months of daily sales, ~200K rows
social-media-mentions.csv <- daily social media counts, ~146K rows
data-dictionary.md <- column definitions for both datasets
methodology-memo-template.md <- preparation and analysis documentation
analysis.ipynb <- your Jupyter notebook (created during work)
findings-summary.md <- deliverable for the buying team (created during work)
methodology-memo.md <- completed methodology memo (created during work)
decision-record.md <- preparation decision documentation (created during work)
Key materials
- sales-data.csv -- daily sales by SKU. 200 SKUs, 24 months, ~200K rows. Contains units_sold, revenue, channel, category, promotion flags.
- social-media-mentions.csv -- daily mention counts from Instagram and TikTok. Same 200 SKUs and date range. IMPORTANT: mention counts are same-day (Tuesday's count is Tuesday's mentions).
- data-dictionary.md -- explains all columns in both datasets.
- methodology-memo-template.md -- template for documenting preparation and analysis decisions. Has a dedicated "Preparation Decisions" section with subsections for feature engineering, temporal splitting, leakage assessment, and data quality.
Tasks
- Profile data -- Load both datasets, check shapes, date ranges, distributions, null patterns. Understand the temporal structure.
- Investigate leakage -- Check whether social media features are same-day or lagged. Same-day is leakage. Create lagged features. Implement temporal train/test split.
- Engineer features -- Handle stockout-censored zeros. Separate seasonal from trend-driven products. Design feature sets per product type. Check multicollinearity.
- Build model -- Fit regression model on honest features with temporal split. Evaluate with MAE/RMSE. Compare performance by product type. Build deliberate leakage comparison.
- Translate findings -- Cross-model review. Translate forecasts into buying team language. Address all five of Eunji's requirements.
- Deliver and close -- Send findings to Eunji. Handle scope extension. Write decision record. Commit and push.
Verification targets
- Temporal split: all training dates must precede all test dates
- No same-day social media features in the final model
- MAE/RMSE computed on temporal test set (not on training data)
- Performance evaluated separately for seasonal and trend-driven products
- All five of Eunji's requirements addressed in the findings summary
- Leakage comparison demonstrates the metric gap between honest and cheating models
Commit convention
Commit after each major analytical milestone with a message describing the decision made. Examples: "identify same-day leakage in social media features", "implement temporal train/test split", "separate seasonal from trend-driven products."