Learn by Directing AI
Unit 3

Schema Migration

Step 1: Review the migration approach

Open materials/migration-guide.md again -- the section on running migrations with existing data.

Your database already has data. Farms, batches, products, orders -- real rows that Marco's team entered. Adding a foreign key constraint means every existing row must satisfy the new relationship. If a batch references farm_id = 7 and no farm with id = 7 exists, the migration fails.

Before writing any migration SQL, check for orphaned references. A query like SELECT * FROM batches WHERE farm_id NOT IN (SELECT id FROM farms) reveals rows that would violate the new constraint. Fix the data first, then add the constraint.

Step 2: Write the migration

Direct Claude to write the migration SQL. Add foreign keys to the existing tables based on your design from Unit 2:

  • batches.farm_id referencing farms.id
  • products.batch_id referencing batches.id
  • order_items.product_id referencing products.id
  • order_items.order_id referencing orders.id
  • orders.customer_id referencing customers.id

Each foreign key must have an explicit ON DELETE behaviour -- the decisions you made in Unit 2. AI commonly omits the ON DELETE clause, defaulting to NO ACTION. Check every foreign key in the generated SQL. If the ON DELETE clause is missing, add it.

Run the migration. If it fails on existing data, read the error -- it tells you which rows violate the constraint. Fix those rows and run again.

Step 3: Seed relational test data

Open materials/seed-data.sql.

This seed data is designed with specific edge cases. Three farms, five batches, eight products, four customers, six orders with twelve order items. The edge cases matter:

  • Batch 2024-H2-05 has a product (Floral Reserve Bar) but zero orders. No order_item references any product from this batch.
  • Customer 4 (Antigua Gourmet) exists but has zero orders.
  • Farm 3 (Finca El Mirador) has only one batch.

These aren't random. They're the cases where queries break. A batch with no orders tests whether your JOIN includes or excludes it. A customer with no orders tests whether the customer endpoint returns an empty list or an error.

Direct Claude to insert this seed data. Verify the foreign key references are consistent -- every farm_id in batches points to a real farm, every batch_id in products points to a real batch.

Step 4: Write the trace query

The trace starts with a batch and fans out in both directions. Given batch 2024-H2-03 (the Munich incident batch), you need:

  • The farm it came from (Finca Rosario)
  • All products made from this batch
  • All order items for those products
  • All orders containing those items
  • All customers who placed those orders

Direct Claude to write this query. It involves JOINs across multiple tables. Run it against the seed data.

Now run the same query for batch 2024-H2-05 -- the one with zero orders. Does the result include the batch and farm? Or does the batch silently disappear?

If the query uses INNER JOIN, batch 2024-H2-05 vanishes from the results because there are no matching rows on the orders side. Switch to LEFT JOIN and run again. The batch appears with NULL values in the order columns.

The join type is a claim about what missing data means. INNER JOIN says "only show me batches that have orders." LEFT JOIN says "show me all batches, and include their orders if they have any." For traceability, you need LEFT JOIN -- a batch with no orders is still a batch worth knowing about.

Step 5: Verify the trace

Run the trace query for multiple scenarios:

  • Batch with multiple orders (2024-H2-03): returns farm, products, multiple orders across multiple customers
  • Batch with one order: returns the single chain
  • Batch with no orders (2024-H2-05): returns farm and batch with null order data
  • Farm with multiple batches (Finca Rosario): GET /farms/1/batches returns both batches

The trace should work in both directions. From batch to farm (upstream), and from batch to orders to customers (downstream). Verify both directions return correct data.

✓ Check

Check: Run the trace query for a specific batch. Does it return the farm, the batch, and all orders? Run it for a batch with no orders. Does it return the farm and batch with zero orders, or does it drop the batch entirely?