Learn by Directing AI
Unit 1

Astrid's message and the cold-start analysis

Step 1: Project setup

Open a terminal and start Claude Code:

cd ~/dev
claude

Paste this prompt:

Set up my project:
1. Create ~/dev/data-science/p7
2. Download the project materials from https://learnbydirectingai.dev/materials/datascience/p7/materials.zip and extract them into that folder
3. Read the data dictionary -- it describes all three datasets

Claude will create the folder, download and extract the materials, and read through the data dictionary. Notice something different from previous projects: there is no CLAUDE.md in the materials. Every project before this one handed you a governance file that gave AI context about the project from its first prompt. This time, there is no such file. You will build it yourself later.

Step 2: Read Astrid's message

Open the project in the platform. Astrid Lindqvist is the Research Director for the Air Quality Program at the Nordic Environmental Research Institute in Gothenburg.

Her Slack message is precise and direct. No small talk, no stories. The Swedish Environmental Protection Agency has commissioned an analysis: have particulate levels changed since the vehicle emission regulation took effect three years ago? Seven years of hourly readings from 40 monitoring stations across Stockholm, Gothenburg, and Malmo. PM2.5, NO2, ozone, SO2. Weather data at each station. About 2.5 million data points.

Her last line matters: "The analysis must be methodologically defensible and fully reproducible."

Step 3: Reply to Astrid

Write your own message to Astrid. There are no suggested replies this time -- you decide how to open the conversation.

Astrid responds within the hour. Concise, professional. She confirms the regulation took effect on January 1 three years ago. She sends the station documentation reference. Then she asks a pointed question: "What approach will you take to separate the regulation's effect from weather and seasonal variation?"

She does not wait for you to answer before adding: "The agency needs to know if the regulation worked. Not whether PM2.5 changed -- whether the regulation caused the change. Those are different questions."

That distinction will shape the entire analysis.

Step 4: Profile the air quality data

Direct AI to load materials/air-quality-data.csv. Ask for the shape, column names and types, date range, station distribution across the three cities, and basic statistics for the pollutant columns.

You have daily averages from 40 monitoring stations spanning January 2019 through December 2025. Each row is one station on one day: PM2.5, NO2, ozone, and SO2 concentrations. Some readings are missing -- equipment maintenance gaps, about 2% of the data.

Look at the PM2.5 values across the three cities. Look at how they vary by season. The winter readings are substantially higher than summer. This seasonal pattern is part of the noise Astrid mentioned.

Step 5: Profile the weather data

Direct AI to load materials/weather-data.csv. Temperature, wind speed, and precipitation for each station on each day. The same station-date combinations as the air quality data.

Look at the temperature patterns. Nordic winters are cold. Cold, calm days concentrate pollutants. This is a confounder -- a factor that affects PM2.5 readings independently of the regulation. The analysis must account for it.

Step 6: Observe cold-start behavior

This step is deliberate. Direct AI to start preparing the data for analysis. Do not give it any analytical constraints. Just ask it to set up the data for a before-after comparison.

Watch what AI does. Does it use temporal splits or random splits? Does it check statistical assumptions before running tests? Does it report effect sizes alongside p-values, or just p-values? Does it mention confidence intervals?

Write down what you observe. You will come back to this later, and the comparison will matter.

✓ Check

Check: Both datasets loaded. Row counts and column counts confirmed. Date range verified (7 years). Station distribution across three cities verified. AI's default behavior observed and noted (expect: random splits, no assumption checks, no effect sizes).