ML P8: Feature Versioning + Monitoring Depth

Client: Max Ehrlich, Stoffwechsel (Berlin, Germany) Project: Build a versioned feature pipeline with DVC, embedding-based features, and a production monitoring dashboard for a sustainable fashion recommendation system.

What you are building

A recommendation feature pipeline with DVC-tracked features (tabular and embedding-based), a feature store architecture that separates feature computation from model training, and a production monitoring dashboard that tracks recommendation quality across customer segments with awareness of delayed ground truth.

Tech stack

Python 3.11
DVC (Data Version Control -- feature and data versioning)
sentence-transformers (embedding generation)
scikit-learn (modeling, evaluation)
pandas (data manipulation)
MLflow (experiment tracking)
Git/GitHub (version control)

File structure

materials/
  transactions.csv              -- 15,000 transaction records (purchases, browsing, wishlists, returns)
  products.csv                  -- 800 product records with descriptions, sustainability certs, seasonal tags
  customers.csv                 -- 2,000 customer profiles with demographics and preferences
  production-recommendations-sample.csv  -- 3,000 production recommendation events with drift signal
  dvc-config-template.yaml      -- DVC pipeline configuration template

Work breakdown

T1: Project setup + client discovery + data profiling + work decomposition planning
T2: DVC-tracked tabular feature pipeline with composite key for product ID recycling
T3: Embedding features from product descriptions with train/test boundary enforcement
T4: Production monitoring dashboard with disaggregated metrics and ground truth delay handling
T5: System-level verification with meta-prompting + documentation + delivery

Verification targets

DVC reproduces features exactly from a previous commit (dvc repro)
Composite key (product_id + season) resolves all recycled product IDs
Embedding generation runs only on training data before being applied to test
Combined features (tabular + embeddings) outperform tabular-only on at least one metric
Simulated data shift triggers visible alert in monitoring dashboard
Disaggregated metrics show shift affecting specific customer segments differently
Dashboard uses business language, not raw technical metric names
README documents the full system with reproduction instructions

Commit convention

Commit after completing each ticket. Use descriptive messages: "T2: build DVC-tracked tabular feature pipeline with composite key" not "add features."