Learn by Directing AI
All materials

CLAUDE.md

ML P8: Feature Versioning + Monitoring Depth

Client: Max Ehrlich, Stoffwechsel (Berlin, Germany) Project: Build a versioned feature pipeline with DVC, embedding-based features, and a production monitoring dashboard for a sustainable fashion recommendation system.

What you are building

A recommendation feature pipeline with DVC-tracked features (tabular and embedding-based), a feature store architecture that separates feature computation from model training, and a production monitoring dashboard that tracks recommendation quality across customer segments with awareness of delayed ground truth.

Tech stack

  • Python 3.11
  • DVC (Data Version Control -- feature and data versioning)
  • sentence-transformers (embedding generation)
  • scikit-learn (modeling, evaluation)
  • pandas (data manipulation)
  • MLflow (experiment tracking)
  • Git/GitHub (version control)

File structure

materials/
  transactions.csv              -- 15,000 transaction records (purchases, browsing, wishlists, returns)
  products.csv                  -- 800 product records with descriptions, sustainability certs, seasonal tags
  customers.csv                 -- 2,000 customer profiles with demographics and preferences
  production-recommendations-sample.csv  -- 3,000 production recommendation events with drift signal
  dvc-config-template.yaml      -- DVC pipeline configuration template

Work breakdown

  • T1: Project setup + client discovery + data profiling + work decomposition planning
  • T2: DVC-tracked tabular feature pipeline with composite key for product ID recycling
  • T3: Embedding features from product descriptions with train/test boundary enforcement
  • T4: Production monitoring dashboard with disaggregated metrics and ground truth delay handling
  • T5: System-level verification with meta-prompting + documentation + delivery

Verification targets

  • DVC reproduces features exactly from a previous commit (dvc repro)
  • Composite key (product_id + season) resolves all recycled product IDs
  • Embedding generation runs only on training data before being applied to test
  • Combined features (tabular + embeddings) outperform tabular-only on at least one metric
  • Simulated data shift triggers visible alert in monitoring dashboard
  • Disaggregated metrics show shift affecting specific customer segments differently
  • Dashboard uses business language, not raw technical metric names
  • README documents the full system with reproduction instructions

Commit convention

Commit after completing each ticket. Use descriptive messages: "T2: build DVC-tracked tabular feature pipeline with composite key" not "add features."