Step 1: Project setup
Open a terminal and start Claude Code:
cd ~/dev
claude
Paste this prompt:
Set up my project:
1. Create ~/dev/data-science/p6
2. Download the project materials from https://learnbydirectingai.dev/materials/datascience/p6/materials.zip and extract them into that folder
3. Read CLAUDE.md -- it's the project governance file
Claude will create the folder, download and extract the materials, and read through CLAUDE.md. That file has the full project context: the client, the deliverable, the tech stack, the task list, and the verification guidance.
Once Claude confirms it has read CLAUDE.md, you are set up.
Step 2: Read Hassan's email
Open the project in the platform. Hassan El-Amin is the Founder and Managing Director of Nile Compass Tours -- a growing tour operator in Cairo offering private cultural tours, multi-day Egypt itineraries, and Nile cruise packages.
His email is long and enthusiastic. He tells you about building the company from nothing, mentions a disastrous first tour involving a flat tire near the Valley of the Kings, and then gets to the point: bookings have grown from 1,200 three years ago to 3,000 last year. He does not know why. Marketing shift? Exchange rate? New Luxor packages?
His central request: "I want to understand our booking patterns."
Read that line carefully. It sounds clear. It is not. "Understand our booking patterns" is not a question type. It could mean half a dozen different things. That ambiguity will matter in the next unit.
He also mentions his silent partner -- an accountant who wants to see methodology, not just conclusions. Two audiences, two deliverables.
Step 3: Reply to Hassan
Below the email, pick a reply option. One starts with profiling the data. The other asks Hassan to clarify what he means by "understand."
Hassan responds within the hour with a second email -- longer than the first. He adds details: the marketing change happened exactly eighteen months ago. They shifted from print ads in travel magazines to Instagram influencer campaigns and Google Ads. He also mentions the Luxor premium packages launched about a year ago. His tone is slightly anxious -- he is making investment decisions and wants evidence, not guesswork.
Step 4: Profile the booking data
Direct AI to load materials/bookings.csv. Ask for the shape, column names and types, date range, and value distributions for key columns.
Read the output. You have about 6,300 rows of booking data spanning January 2023 through December 2025. Each row is one booking: when it was made, travel dates, tour type, traveler country, group size, how the traveler found the company, what they paid, booking status, and an optional review score.
Notice the status field. Both confirmed and cancelled bookings are in the same table. That will need handling before analysis.
Notice the marketing_channel field. Look at the data dictionary note about it. This field records how travelers say they found Nile Compass Tours. Self-reported. That detail matters more than it seems right now.
Direct AI to show booking counts by year. You should see growth: roughly 1,200 in 2023, 2,100 in 2024, 3,000 in 2025. This matches what Hassan described.
Step 5: Profile the marketing spend data
Direct AI to load materials/marketing-spend.csv. Ask for the shape, column names, date range, and spend distributions by channel.
You have 144 rows: 36 months of data across four marketing channels -- Print, Instagram, Google Ads, and Travel Portals. Look at the spending patterns. Print spending is high in the first 18 months and then drops sharply. Instagram and Google Ads are low early on and then rise sharply. The shift happens around July 2024 -- eighteen months ago, exactly when Hassan said he changed strategy.
This is a deliberate, abrupt reallocation of marketing budget, not a gradual shift. That timing will be important when you try to separate the effect of marketing from other factors driving growth.
Step 6: Read the data dictionary
Open materials/data-dictionary.md. It has two sections: one for the booking data, one for the marketing spend data.
Read the notes column carefully. The booking data's marketing_channel field has a critical note: "Self-reported by the traveler at booking time. This field may not accurately reflect how the traveler actually discovered Nile Compass Tours." This is not just a data quality warning. Self-reported attribution is systematically biased -- people who saw an Instagram ad but then searched on Google often say "Google." The self-reported channel underestimates the real impact of upstream marketing.
This limitation constrains what the analysis can say about which channels drive bookings. Keep it in mind.
Check: Both datasets loaded. Row counts and column counts confirmed. Date range verified (3 years). Booking growth trend visible. Marketing shift timing identified (~18 months ago). Self-reported attribution field noted.