Glow Republic Demand Forecasting

Client

Eunji Cho, Head of Merchandising Analytics at Glow Republic -- a mid-size Korean beauty retailer in Seoul with 12 physical stores and an e-commerce platform. ~2,000 SKUs from 150 K-beauty brands.

What you are building

A demand forecasting model that predicts weekly demand for Glow Republic's top 200 SKUs using sales history and social media data. The model must separate seasonal products (predictable patterns) from trend-driven products (viral, unpredictable) and provide actionable ordering recommendations for the buying team.

Tech stack

Python 3.11+ (conda "ds" environment)
Jupyter Notebook
pandas (data manipulation)
scikit-learn (RandomForestRegressor or GradientBoostingRegressor, train_test_split, feature_importances_, MAE, RMSE)
matplotlib / seaborn (visualization)
scipy (statistical checks)
Claude Code (AI direction)
Git / GitHub (version control)

File structure

p5/
  CLAUDE.md              <- this file
  materials/
    sales-data.csv       <- 24 months of daily sales, ~200K rows
    social-media-mentions.csv <- daily social media counts, ~146K rows
    data-dictionary.md   <- column definitions for both datasets
    methodology-memo-template.md <- preparation and analysis documentation
  analysis.ipynb         <- your Jupyter notebook (created during work)
  findings-summary.md    <- deliverable for the buying team (created during work)
  methodology-memo.md    <- completed methodology memo (created during work)
  decision-record.md     <- preparation decision documentation (created during work)

Key materials

sales-data.csv -- daily sales by SKU. 200 SKUs, 24 months, ~200K rows. Contains units_sold, revenue, channel, category, promotion flags.
social-media-mentions.csv -- daily mention counts from Instagram and TikTok. Same 200 SKUs and date range. IMPORTANT: mention counts are same-day (Tuesday's count is Tuesday's mentions).
data-dictionary.md -- explains all columns in both datasets.
methodology-memo-template.md -- template for documenting preparation and analysis decisions. Has a dedicated "Preparation Decisions" section with subsections for feature engineering, temporal splitting, leakage assessment, and data quality.

Tasks

Profile data -- Load both datasets, check shapes, date ranges, distributions, null patterns. Understand the temporal structure.
Investigate leakage -- Check whether social media features are same-day or lagged. Same-day is leakage. Create lagged features. Implement temporal train/test split.
Engineer features -- Handle stockout-censored zeros. Separate seasonal from trend-driven products. Design feature sets per product type. Check multicollinearity.
Build model -- Fit regression model on honest features with temporal split. Evaluate with MAE/RMSE. Compare performance by product type. Build deliberate leakage comparison.
Translate findings -- Cross-model review. Translate forecasts into buying team language. Address all five of Eunji's requirements.
Deliver and close -- Send findings to Eunji. Handle scope extension. Write decision record. Commit and push.

Verification targets

Temporal split: all training dates must precede all test dates
No same-day social media features in the final model
MAE/RMSE computed on temporal test set (not on training data)
Performance evaluated separately for seasonal and trend-driven products
All five of Eunji's requirements addressed in the findings summary
Leakage comparison demonstrates the metric gap between honest and cheating models

Commit convention

Commit after each major analytical milestone with a message describing the decision made. Examples: "identify same-day leakage in social media features", "implement temporal train/test split", "separate seasonal from trend-driven products."