ML P8: Feature Versioning + Monitoring Depth
Client: Max Ehrlich, Stoffwechsel (Berlin, Germany) Project: Build a versioned feature pipeline with DVC, embedding-based features, and a production monitoring dashboard for a sustainable fashion recommendation system.
What you are building
A recommendation feature pipeline with DVC-tracked features (tabular and embedding-based), a feature store architecture that separates feature computation from model training, and a production monitoring dashboard that tracks recommendation quality across customer segments with awareness of delayed ground truth.
Tech stack
- Python 3.11
- DVC (Data Version Control -- feature and data versioning)
- sentence-transformers (embedding generation)
- scikit-learn (modeling, evaluation)
- pandas (data manipulation)
- MLflow (experiment tracking)
- Git/GitHub (version control)
File structure
materials/
transactions.csv -- 15,000 transaction records (purchases, browsing, wishlists, returns)
products.csv -- 800 product records with descriptions, sustainability certs, seasonal tags
customers.csv -- 2,000 customer profiles with demographics and preferences
production-recommendations-sample.csv -- 3,000 production recommendation events with drift signal
dvc-config-template.yaml -- DVC pipeline configuration template
Work breakdown
- T1: Project setup + client discovery + data profiling + work decomposition planning
- T2: DVC-tracked tabular feature pipeline with composite key for product ID recycling
- T3: Embedding features from product descriptions with train/test boundary enforcement
- T4: Production monitoring dashboard with disaggregated metrics and ground truth delay handling
- T5: System-level verification with meta-prompting + documentation + delivery
Verification targets
- DVC reproduces features exactly from a previous commit (
dvc repro) - Composite key (product_id + season) resolves all recycled product IDs
- Embedding generation runs only on training data before being applied to test
- Combined features (tabular + embeddings) outperform tabular-only on at least one metric
- Simulated data shift triggers visible alert in monitoring dashboard
- Disaggregated metrics show shift affecting specific customer segments differently
- Dashboard uses business language, not raw technical metric names
- README documents the full system with reproduction instructions
Commit convention
Commit after completing each ticket. Use descriptive messages: "T2: build DVC-tracked tabular feature pipeline with composite key" not "add features."