Step 1: Investigate the zero-demand periods
Direct AI to look for SKUs with extended periods of zero units sold -- at least two consecutive weeks of zeros. How many SKUs have these patterns? Which ones?
Zeros in sales data are not always zero demand. If Glow Republic ran out of a product, the sales are zero because there was nothing to sell -- not because nobody wanted it. Training a model on these zeros teaches it that demand was zero when demand was actually unknown.
Ask Eunji about unusual patterns in the data. She responds in quick Slack messages:
oh yeah, we ran out of the hyaluronic toner for like three weeks in September
couldn't get it from the supplier
and the snail cream -- we stopped carrying that for a month when the trend died, then it came back
She knows about the stockouts but does not see the analytical implication. Those zeros are censored demand.
Step 2: Handle censored demand
Direct AI to flag the stockout periods you identified. The safest approach: exclude those rows from training entirely. The model should not learn from periods where sales were zero due to supply constraints rather than demand.
Document this in the methodology memo under "Data Quality Issues." Note which SKUs were affected, what periods were excluded, and why.
Eunji also mentioned a platform change about eight months ago. Direct AI to look for data quality issues around May 2025. You will find about two weeks of missing or sparse data from the e-commerce platform migration. Flag these rows and note them in the methodology memo.
Step 3: Engineer seasonal features
Create calendar features that capture regular patterns:
month-- captures seasonal cyclesday_of_week-- captures within-week patterns (weekends vs weekdays)is_holiday-- flag for major Korean holidays (Chuseok, Lunar New Year) and seasonal campaigns (summer)
These features are different from the social media lags. Calendar features capture regular, repeating patterns. Social media lags capture trend signals. Both are legitimate -- they describe different aspects of demand.
Step 4: Separate seasonal from trend-driven products
Eunji mentioned that sunscreen is different from a viral serum. She is right, and the data confirms it.
Direct AI to analyze the sales patterns across SKUs. Some products have clear seasonal cycles -- sunscreen peaks in summer, moisturizer peaks in winter, masks peak around Chuseok and Lunar New Year. Others have flat baselines with occasional dramatic spikes that do not follow any calendar pattern.
Group the products into two categories: seasonal (regular, repeatable patterns) and trend-driven (irregular, social-media-dependent spikes). This is not a precise classification -- some products fall in between. The point is that these two types need different feature sets and have different forecast reliability.
For seasonal products, historical patterns and calendar features carry most of the predictive signal. For trend-driven products, lagged social media mentions are more important. A single model treats all SKUs the same and averages out these differences.
Step 5: Design the feature set
This is the preparation design step. In previous projects, the preparation steps were more prescribed. This time, you decide what features to include.
Direct AI to build the feature sets. For seasonal products: calendar features, historical sales lag (same week last year, same month average), category. For trend-driven products: lagged social media mentions (1-day, 7-day average), influencer tag lags, historical sales lag.
Document each feature engineering decision in the methodology memo under "Feature Engineering." For each feature, note why it is included and confirm it would be available at prediction time. Any feature that depends on information not available when the buying team places orders is not legitimate.
Step 6: Check for multicollinearity and distributions
Direct AI to check whether features are highly correlated with each other. The 1-day lag and 7-day lag of mentions might overlap heavily. If they are correlated above 0.8-0.9, consider keeping only one or using the rolling average.
Check feature distributions for outliers or extreme skew. Some features may need log transformation. Others may need outlier handling.
This is assumption checking applied to the feature set -- a step you do before fitting any model, not after. The features need to be appropriate for the model you plan to use.
Check: Stockout zeros handled. Products categorized. Feature set documented with rationale. Multicollinearity checked.