Step 1: Implement watermark monitoring
The pipeline runs on a schedule. Data loads. Tests pass. Everything looks green. But if the source stops sending new data, the pipeline still runs successfully -- it just loads zero new records. No error. No alert. The morning report shows yesterday's numbers because no new numbers arrived.
This is the failure that watermark monitoring catches. After each incremental run, check that the watermark advanced. Log the watermark value and compare to the previous run. A watermark that has not advanced in 48 hours means the pipeline is running but loading nothing.
Configure a Dagster sensor or schedule-based check that tracks watermark progression. The check should compare the current maximum mill_date (or processing_date) against the last recorded maximum. If the watermark has not advanced since the previous run, flag it.
Step 2: Set up the pre-commit hook
A pre-commit hook runs automatically before every git commit. If it fails, the commit is rejected. This is verification infrastructure that fires regardless of whether you remember to run the tests manually.
Set up a pre-commit hook that runs dbt test. Every commit triggers the test suite. If any test fails, the commit does not proceed.
Create the hook in .git/hooks/pre-commit. The script runs dbt test and exits with the test command's exit code. A non-zero exit code means tests failed, and git rejects the commit.
#!/bin/bash
echo "Running dbt tests..."
dbt test
Verify it works: intentionally break a model or add a test you know will fail. Attempt to commit. The hook should catch the failure and reject the commit.
Step 3: Set up the pre-push PII hook
The rice mill data includes farmer names. Set up a pre-push hook that scans staged files for PII patterns -- names, phone numbers, identification numbers that should not appear in files being pushed to a remote repository.
Configure a regex-based scan. Test it with a file that contains a farmer name where it should not be. The hook should catch it and reject the push.
This is lighter than the governance work in P6, but the pattern is the same: automated infrastructure that catches what manual review misses.
Step 4: Test the fail-open path
What happens when the dbt test runner is not available? If the hook script runs dbt test and dbt is not installed, what does the hook return?
If the hook exits with code 0 when dbt is missing, the hook is fail-open. Every commit passes because the check cannot execute, not because the tests passed. This gives false confidence -- you think the hook is protecting you, but it is doing nothing.
Configure the hook to fail closed. If dbt test cannot execute -- command not found, configuration error, any execution failure -- the hook should exit with a non-zero code and print a clear message. The commit is rejected until the test runner works.
The difference between fail-open and fail-closed is the difference between verification infrastructure that works and verification infrastructure that only looks like it works.
Step 5: Update CLAUDE.md
The project memory should reflect the current state of the project. You have added infrastructure since Unit 2: watermark monitoring, pre-commit hooks, pre-push PII scanning, fail-closed configuration.
Update CLAUDE.md with these additions. Future AI sessions on this project need to know: what hooks exist, what they check, what happens when they fail, how watermark monitoring is configured. An AI session that does not know about the pre-commit hook will not know to run tests before suggesting a commit.
The project memory grows as the project grows. A CLAUDE.md that still describes only the initial data sources and field mappings is incomplete -- it is missing the infrastructure decisions that shape how the project works now.
Check: Intentionally introduce a failing dbt test. Attempt to commit. Does the pre-commit hook reject the commit?