Step 1: Project setup
Open a terminal and start Claude Code:
cd ~/dev
claude
Paste this prompt:
Set up my project:
1. Create ~/dev/data-science/p5
2. Download the project materials from https://learnbydirectingai.dev/materials/datascience/p5/materials.zip and extract them into that folder
3. Read CLAUDE.md -- it's the project governance file
Claude will create the folder, download and extract the materials, and read through CLAUDE.md. That file has the full project context: the client, the deliverable, the tech stack, the task list, and the verification guidance.
Once Claude confirms it has read CLAUDE.md, you are set up.
Step 2: Read Eunji's Slack message
Open the project in the platform. Eunji Cho is the Head of Merchandising Analytics at Glow Republic -- a K-beauty retailer in Seoul with 12 stores and an e-commerce platform.
Her problem is specific and expensive. K-beauty trends move fast. A product goes from unknown to sold out in two weeks because of one viral social media post. Last quarter: 180 million won in expired inventory write-offs and 250 million won in missed sales from stockouts.
She has two years of daily sales data across about 2,000 SKUs, social media mention counts from Instagram and TikTok, influencer tag data, and seasonal calendars. She wants to predict demand by SKU at least a week ahead so the buying team can order the right quantities.
She also mentions that sunscreen and seasonal products are different from viral products. That distinction will matter more than it seems right now.
Step 3: Reply to Eunji
Below the Slack message, pick a reply option. Something that confirms you will start by profiling the data.
Eunji responds quickly -- multiple short messages. She is enthusiastic that someone is looking at this, mentions the sunscreen stockout from last summer, and asks how soon she can see initial results. She communicates in fast Slack bursts, mixing K-beauty terminology with business urgency.
Step 4: Profile the sales data
Direct AI to load materials/sales-data.csv. Ask for the shape, column names and types, date range, null counts, and value distributions for key columns.
Read the output. You have about 145,000 rows of daily sales data spanning January 2024 through December 2025. Two hundred unique SKUs across ten product categories (cleanser, toner, essence, serum, ampoule, moisturizer, sunscreen, mask, lip, eye). Each row records one SKU's sales for one day: units sold, revenue in KRW, sales channel, and whether a promotion was active.
Notice the temporal structure. This is daily data over 24 months. That time dimension shapes everything about how you prepare and evaluate a forecasting model.
Step 5: Profile the social media data
Direct AI to load materials/social-media-mentions.csv. Get the shape, columns, date range, and mention count distributions.
This dataset covers the same 200 SKUs and the same date range. Each row has Instagram mention counts, TikTok mention counts, and influencer tag counts for one SKU on one day.
Look at the date column. The mention counts for each date correspond to that same day's social media activity. Take note of this -- you will come back to what it means for prediction.
Step 6: Read the data dictionary
Open materials/data-dictionary.md. It explains what each column represents in both datasets.
Pay attention to how the social media data is timestamped. The data dictionary notes that social media counts are same-day: the count for Tuesday includes mentions that occurred on Tuesday. Read that, note it, and move on. You will revisit this detail soon.
Check: Both datasets loaded. Date ranges confirmed. Temporal granularity understood.