Learn by Directing AI
All materials

CLAUDE.md

Glow Republic Demand Forecasting

Client

Eunji Cho, Head of Merchandising Analytics at Glow Republic -- a mid-size Korean beauty retailer in Seoul with 12 physical stores and an e-commerce platform. ~2,000 SKUs from 150 K-beauty brands.

What you are building

A demand forecasting model that predicts weekly demand for Glow Republic's top 200 SKUs using sales history and social media data. The model must separate seasonal products (predictable patterns) from trend-driven products (viral, unpredictable) and provide actionable ordering recommendations for the buying team.

Tech stack

  • Python 3.11+ (conda "ds" environment)
  • Jupyter Notebook
  • pandas (data manipulation)
  • scikit-learn (RandomForestRegressor or GradientBoostingRegressor, train_test_split, feature_importances_, MAE, RMSE)
  • matplotlib / seaborn (visualization)
  • scipy (statistical checks)
  • Claude Code (AI direction)
  • Git / GitHub (version control)

File structure

p5/
  CLAUDE.md              <- this file
  materials/
    sales-data.csv       <- 24 months of daily sales, ~200K rows
    social-media-mentions.csv <- daily social media counts, ~146K rows
    data-dictionary.md   <- column definitions for both datasets
    methodology-memo-template.md <- preparation and analysis documentation
  analysis.ipynb         <- your Jupyter notebook (created during work)
  findings-summary.md    <- deliverable for the buying team (created during work)
  methodology-memo.md    <- completed methodology memo (created during work)
  decision-record.md     <- preparation decision documentation (created during work)

Key materials

  • sales-data.csv -- daily sales by SKU. 200 SKUs, 24 months, ~200K rows. Contains units_sold, revenue, channel, category, promotion flags.
  • social-media-mentions.csv -- daily mention counts from Instagram and TikTok. Same 200 SKUs and date range. IMPORTANT: mention counts are same-day (Tuesday's count is Tuesday's mentions).
  • data-dictionary.md -- explains all columns in both datasets.
  • methodology-memo-template.md -- template for documenting preparation and analysis decisions. Has a dedicated "Preparation Decisions" section with subsections for feature engineering, temporal splitting, leakage assessment, and data quality.

Tasks

  1. Profile data -- Load both datasets, check shapes, date ranges, distributions, null patterns. Understand the temporal structure.
  2. Investigate leakage -- Check whether social media features are same-day or lagged. Same-day is leakage. Create lagged features. Implement temporal train/test split.
  3. Engineer features -- Handle stockout-censored zeros. Separate seasonal from trend-driven products. Design feature sets per product type. Check multicollinearity.
  4. Build model -- Fit regression model on honest features with temporal split. Evaluate with MAE/RMSE. Compare performance by product type. Build deliberate leakage comparison.
  5. Translate findings -- Cross-model review. Translate forecasts into buying team language. Address all five of Eunji's requirements.
  6. Deliver and close -- Send findings to Eunji. Handle scope extension. Write decision record. Commit and push.

Verification targets

  • Temporal split: all training dates must precede all test dates
  • No same-day social media features in the final model
  • MAE/RMSE computed on temporal test set (not on training data)
  • Performance evaluated separately for seasonal and trend-driven products
  • All five of Eunji's requirements addressed in the findings summary
  • Leakage comparison demonstrates the metric gap between honest and cheating models

Commit convention

Commit after each major analytical milestone with a message describing the decision made. Examples: "identify same-day leakage in social media features", "implement temporal train/test split", "separate seasonal from trend-driven products."