Learn by Directing AI

The Brief

Luciana Moretti runs a small family winery in Mendoza, Argentina. Three vineyard plots at different altitudes, Malbec and Cabernet Sauvignon, about 15,000 bottles a year. Every harvest she tastes through hundreds of barrel samples and decides which ones become Reserve -- four times the price of her standard line.

She knows she is inconsistent. Some years generous, some years strict. And she is getting busier. New export contracts mean more barrels, which means more tasting time she does not have.

She has five years of production data: fermentation temperatures, altitude, rainfall, soil analysis, barrel aging, and quality scores from blind tasting panels. About 3,000 barrel samples. She wants a model that predicts which barrels are likely to score high enough for Reserve, so she can focus her tasting on the borderline cases instead of tasting everything.

Your Role

You build a classification model that predicts Reserve designation from production data. You evaluate it honestly -- not with a single accuracy number, but with metrics that show exactly what the model catches, what it misses, and what it falsely flags. You translate the results into language Luciana can use with her export partners.

The scaffolding is the same as last time. You have a methodology memo template, verification guidance, and cross-model review. What changes is the terrain: classification on imbalanced data requires different evaluation from everything you have done before. The tools are familiar. The judgment calls are new.

What's New

Last time, you combined three datasets, ran inferential analysis with assumption checking, and built a prediction model with regularization. You learned to verify through cross-model review and to communicate with effect sizes.

This time, the data is a single source -- no joining work. But only about 8% of barrels make Reserve. That imbalance changes everything about how you evaluate a model. A model that never predicts Reserve achieves 92% accuracy and catches nothing. The metrics that worked for regression do not apply here. Precision, recall, confusion matrices, and ROC curves replace RMSE and R-squared.

The hard part is not building the model. It is figuring out whether the model actually does what Luciana needs -- and that question depends on which errors she can live with.

Tools

Python 3.11+ via your conda "ds" environment
Jupyter Notebook for the analysis
pandas for data handling
scikit-learn for classification models, confusion matrices, ROC curves, and precision/recall
scipy for statistical checks
matplotlib / seaborn for visualization
Claude Code as the AI you direct
Git / GitHub for version control

Materials

You receive:

A barrel production dataset with five years of production data and quality scores
A data dictionary explaining each column
A methodology memo template to fill in as you work
A project governance file (CLAUDE.md) for Claude Code
Luciana's voicemail explaining what she needs

Same scaffolding as last time: templates, verification guidance, cross-model review. No provided answer key.