Learn by Directing AI

The Brief

Budi Hartono owns a shrimp farm in Sidoarjo, East Java. Eight circular ponds producing vannamei shrimp for export markets. Two harvest cycles per year. Six months ago, he installed IoT sensors in three ponds to monitor water quality -- pH, dissolved oxygen, temperature, salinity.

He also has two years of production records for all eight ponds. The data lives in two separate systems and he cannot connect them.

His question is straightforward: does water quality explain why some harvests are better than others? His friend at a tech meetup told him something about SQL. He just wants to understand his ponds better.

Your Role

You deliver a multi-source analysis that connects Budi's sensor readings to his production records. The analytical work is manageable -- profiling, joining, correlating. You have done similar analysis before.

What changes is how AI accesses the data. For the first time, you connect Claude Code to an external database. Instead of describing your data to AI and having it work from your description, AI reads the data directly. That shift changes what AI can do -- and what you need to verify.

What's New

Last time, you built AI infrastructure -- authored CLAUDE.md and AGENTS.md, encoded your analytical conventions, and experienced the difference between a cold-start session and one where AI loads those conventions from its first prompt.

This time, you connect AI to the data itself. The MCP connection lets AI query Budi's database directly -- exploring tables, reading schemas, running SQL. The output quality difference between "AI works from your description" and "AI works from the actual data" is immediate.

The hard part is not the connection. It is learning what to verify when AI makes its own data access decisions -- which tables to scan, which columns to join on, which filters to apply. Verification now extends to the data path, not just the statistical output.

Tools

Python 3.11+ via your conda "ds" environment
Jupyter Notebook for the analysis
DuckDB -- a lightweight analytical database (new)
DuckDB MCP server -- connects Claude Code to the database (new)
pandas for data handling
statsmodels for hypothesis tests
scipy for statistical tests
matplotlib / seaborn for visualization
Claude Code as the AI you direct
Git / GitHub for version control

Materials

You receive:

Sensor readings from three ponds (6 months of hourly water quality data)
Production records for all eight ponds (2 years of per-cycle harvest data)
A data dictionary describing both datasets
No pre-built database -- you load the CSVs into DuckDB as part of the work
No methodology template -- you structure the analysis yourself