Data Science: Track Setup

Complete the platform setup first if you haven't already. You should have a terminal, Claude Code, Git, and a GitHub account ready.

1. Create your track folder

mkdir -p ~/dev/data-science
cd ~/dev/data-science

2. Data science tools: let Claude Code do it

Open Claude Code in your track folder:

claude

Paste this prompt:

I'm setting up a data science environment. Please:

1. Install Python 3.11+ via Miniconda, then create a conda environment called "ds"
2. Install core packages in the ds environment: pandas, jupyter, matplotlib, seaborn, 
   scipy, statsmodels, scikit-learn, plotly
3. Check if Docker is installed. If not, tell me how to install it (it needs admin access)

After each step, verify it worked and show me the result.

Note on Docker: Docker typically needs administrator access. If Claude Code can't install it directly, it will tell you what command to run yourself.

Verify

Once Claude Code finishes:

conda activate ds
python --version
python -c "import pandas; import matplotlib; import scipy; import statsmodels; import sklearn; print('All packages installed')"
jupyter notebook --version

You should see Python 3.11+, "All packages installed", and a Jupyter version number.

3. Your first look

Everything is installed. Before you start Project 1, see what Claude Code can do when you point it at a data science problem.

Stay in your track folder with Claude Code open, and paste this:

Create a small CSV dataset of 300 hospital appointments with columns: patient_age, 
day_of_week, lead_time_days, sms_reminder_sent, no_show. About 20% should be no-shows. 
Then explore the data: profile it, check for patterns in no-shows by age group and 
day of week, run a chi-squared test on sms_reminder vs no_show, and produce 3 
visualizations that tell the story. Summarize the findings in plain language.

In a few minutes, Claude will generate the data, run the analysis, produce charts, and write a summary. A complete analytical workflow from a single prompt.

As you work through the track, you'll learn why a single prompt isn't enough: why that chi-squared test might have violated its assumptions, why those visualizations might be misleading, why "20% no-show rate" might hide important subgroup differences, and why a client would need you to explain what the findings actually mean for their business.

But for now, look at what just happened. That's the starting point.

Ready

Start Project 1.