Step 1: The senior colleague's nudge
Jamie Park drops a message: "hey -- once you've got DuckDB connected via MCP, try having the AI query the production table directly instead of describing it. the difference in output quality is night and day."
That is the direction for this unit. You are about to connect Claude Code to an external database for the first time.
Step 2: What is DuckDB
DuckDB is a lightweight analytical database. It runs as a single file -- no server, no installation beyond the CLI. You load CSVs into it and query them with SQL. It is designed for exactly the kind of work Budi needs: take data from separate files, load them into a database, and join them.
Direct AI to install DuckDB CLI:
pip install duckdb
Step 3: What is MCP
MCP -- Model Context Protocol -- is a standard for connecting AI agents to external tools. When Claude Code is connected to a DuckDB database via MCP, it can read schemas, explore tables, and run queries against live data. The protocol is the same regardless of which AI coding agent you use. The configuration syntax varies by tool; the standard is universal.
This is the first time you connect an external tool. Before this, every interaction with data went through your description -- you told AI what the columns were, and AI worked from that description. After the connection, AI reads the data directly.
Step 4: Create the database and load the data
Direct AI to create a DuckDB database and load both CSVs:
Create a DuckDB database called shrimp.db in the project directory. Load sensor-readings.csv as a table called sensor_readings and production-records.csv as a table called production_records. Verify both tables loaded correctly.
AI will create the database file, load both CSVs, and confirm the table structures.
Step 5: Configure the MCP server
Direct AI to install the DuckDB MCP server and add it to the Claude Code configuration:
Install the DuckDB MCP server and configure it in Claude Code to connect to shrimp.db in this project directory.
AI will install the MCP server package, create or update the MCP configuration file, and register the DuckDB server.
Step 6: Restart with MCP active
Restart Claude Code so the MCP connection loads. Direct AI to explore the database schema -- table names, column names, types. Compare what AI reports now (from direct schema access) versus what it reported in Unit 1 (from your description of the CSV files).
The difference is immediate. AI now knows the actual column names, data types, and row counts without you describing anything. It can run SQL queries directly.
Step 7: Run the first connected query
Direct AI to query the sensor_readings table for the count of distinct sensor IDs. AI generates and executes SQL against the live database -- returning the result without you describing the table structure first.
This is the capability shift. Before the connection, you described the data and AI worked from your words. After the connection, AI reads the data directly. The infrastructure changed what AI can do.
Check: Direct AI to query the sensor_readings table for the count of distinct sensor IDs. AI should return 3 (SID-001, SID-003, SID-006) without the student describing the table structure. The query runs through MCP, not through a described schema.