Learn by Directing AI
All materials

migration-guide.md

Schema Migration Guide: Foreign Keys and Relational Integrity

What a schema migration is

A schema migration is a controlled change to your database structure. When you add a table, rename a column, or add a foreign key, that change is a migration. The key difference from creating a schema from scratch: your database already has data. Every change must either work with the existing data or transform it as part of the migration.

Migrations run in order. Each migration file is numbered or timestamped. The database tracks which migrations have run. When you deploy, the migration runner applies any new migrations that haven't been executed yet. This is how your schema evolves without losing data.

If a migration fails partway through, you need a rollback strategy. Most migration tools support reversible migrations -- an "up" function that applies the change and a "down" function that undoes it.

Foreign keys and ON DELETE

A foreign key is a column in one table that references the primary key of another table. It's a structural claim: "every value in this column corresponds to an existing row in that other table."

ALTER TABLE batches
ADD CONSTRAINT batches_farm_id_fkey
FOREIGN KEY (farm_id) REFERENCES farms(id)
ON DELETE RESTRICT;

The ON DELETE clause determines what happens when you try to delete the referenced row. This is a design decision, not a syntax choice.

RESTRICT: Refuse the delete. If Finca Rosario has batches referencing it, you cannot delete Finca Rosario. The database returns an error. Use RESTRICT when the referenced data must always exist -- provenance records, audit trails, anything where deletion would destroy important history.

CASCADE: Delete the referencing rows too. If you delete Finca Rosario, all batches from Finca Rosario are also deleted, and all products from those batches, and all order items for those products. This can be destructive. Use CASCADE only when the child records have no independent meaning without the parent -- for example, deleting a draft order might cascade to its line items.

SET NULL: Set the foreign key column to NULL. If you delete Finca Rosario, the batches that referenced it still exist, but their farm_id becomes NULL. Use SET NULL when the referencing rows should survive but the relationship is no longer valid -- for example, a discontinued supplier whose past batches still have value.

There is no neutral choice. Each option preserves some data properties and sacrifices others. When you add a foreign key without specifying ON DELETE, PostgreSQL defaults to NO ACTION (similar to RESTRICT but with different transaction timing). Always specify the behaviour explicitly.

Running migrations with existing data

When you add a foreign key to a table that already has data, every existing row must satisfy the new constraint. If the batches table has a row with farm_id = 7 but no farm with id = 7 exists, the migration fails.

Before running the migration:

  1. Check for orphaned references: SELECT * FROM batches WHERE farm_id NOT IN (SELECT id FROM farms);
  2. Either fix the data (assign a valid farm_id or delete the orphaned rows) or adjust the constraint
  3. Run the migration
  4. Verify the constraint is active: try inserting a row with an invalid foreign key -- it should fail

Common gotcha: the migration succeeds in development (where the seed data is clean) but fails in production (where real data has inconsistencies). Always check for orphaned references before deploying a FK migration.

Testing after migration

After adding foreign keys, test three scenarios:

  1. Valid references: Insert a row with a valid foreign key. It should succeed.
  2. Invalid references: Insert a row with an invalid foreign key. It should fail with a constraint violation error.
  3. Delete behaviour: Delete a referenced row and verify the ON DELETE behaviour matches your design decision. If you chose RESTRICT, the delete should fail. If CASCADE, the child rows should be gone. If SET NULL, the child rows should have NULL in the foreign key column.

Test with the edge cases too: what happens when there are zero child rows? What happens when there are many? The constraint should behave the same regardless of how many rows are affected.