Step 1: Design the verification plan with meta-prompting
You have a versioned feature pipeline, embedding features, and a production monitoring dashboard. Verifying each piece individually is necessary but not sufficient. The system-level question is whether they work together.
You may not be sure what a comprehensive verification plan looks like for this system. That uncertainty is a directing opportunity. Ask Claude:
I have a versioned feature pipeline with DVC, embedding-based features using sentence-transformers, and a production monitoring dashboard with disaggregated metrics. Help me design a comprehensive verification plan. What should I test to confirm the full system works end-to-end?
AI will generate a verification plan. Read it critically. Does it cover DVC reproducibility? Does it check the embedding train/test boundary? Does it test the monitoring sensitivity to drift? Does it verify the dashboard shows accurate disaggregated metrics? AI's verification plans tend to be biased toward failure modes it recognizes from common patterns -- it may miss context-specific checks that matter for this particular system.
Adjust the plan. Add checks it missed. Remove checks that are redundant. This is meta-prompting: using AI to help design the verification, then evaluating the design with your own judgment.
Step 2: Execute the verification plan
Run every check in the plan.
DVC reproducibility: check out a previous commit, run dvc checkout, confirm the features match exactly. This proves any historical experiment is reproducible.
Embedding boundary: confirm that embeddings were generated on the training set first, then applied to the test set. Check the sample counts -- training and test should be clearly separated.
Monitoring sensitivity: run the monitoring on the production data. Confirm the simulated drift triggers an alert for the 25-34 age group. Confirm the aggregate view does not show the problem. Confirm the disaggregated view does.
Dashboard accuracy: verify that the numbers on the dashboard match the computed values. A dashboard that displays incorrect metrics is worse than no dashboard.
Step 3: Document the versioning strategy
Write documentation for the feature versioning system. This is a communication artifact -- the next person who works on this codebase needs to understand what is tracked, why, and how to reproduce any experiment.
Cover: what DVC tracks (raw data, tabular features, embedding features, model artifacts), how to reproduce a historical experiment (check out the commit, run dvc checkout, run dvc repro), and how the feature store is organized (separate directories for raw, prepared, features, and model).
Step 4: Push to GitHub
Ask Claude to create a README that covers the project: what was built, how to reproduce, what the monitoring tracks, and how to interpret the dashboard. Then push the full project to GitHub.
The README should be specific to this project -- not a generic template. Someone reading it should understand why features are versioned, what the monitoring catches, and how to use the dashboard.
Step 5: Deliver to Max
Present the complete system to Max: a versioned feature pipeline with tabular and embedding features, a recommendation model, and a monitoring dashboard with business-language metrics for his merchandising team.
Max responds with enthusiasm. The recommendations are better. The monitoring gives his team visibility. The versioning means they can trace any recommendation back to the features that produced it.
He has one more thought: "Next project -- we do the visual similarity thing, ja?" Image-based features, deferred from Unit 3. A good sign. The work is done, and the client is already thinking about what comes next.
Check: All verification checks pass: DVC reproduces features exactly, embeddings are generated only on training data, monitoring catches simulated drift with disaggregated metrics, and the README documents the full system.