Learn by Directing AI

The Brief

Max Ehrlich co-founded Stoffwechsel, an online marketplace for sustainable fashion in Berlin. About 2,000 products from 85 brands, rotating constantly with seasonal collections and new designers.

His recommendation system shows "other people bought this." It has not changed in three years. The catalog changes every week. The recommendations are always stale.

He wants recommendations that feel personal and adapt to catalog changes. His merchandising team wants to understand what drives the recommendations -- is it price, brand, style, sustainability score? And he needs to know if the recommendations are actually working. Not just clicks. Actual purchases people keep.

He has three years of transaction data, product metadata, and customer profiles. The data has quirks he has not thought to mention.

Your Role

You are building a recommendation feature pipeline with versioned features, embedding-based representations, and a production monitoring dashboard.

No guides this time. You receive a brief with deliberate gaps, a dataset with embedded complexity, and a DVC configuration template. You synthesize the brief into a project plan, decide what features to build, choose how to represent product descriptions, design the monitoring system, and pick which metrics to track. Claude's plan mode continues from last time -- use it to map out dependencies before building.

What's New

Last time you built CI/CD infrastructure -- eval gates and drift detection. The model was already working; the challenge was keeping it reliable.

This time you build the features that the model depends on. DVC enters as a tool for versioning data transformations alongside code -- so you can trace which features produced which results at any point in the project's history. Sentence-transformers enter as a way to convert product descriptions into dense vector features that feed a classical scikit-learn classifier. This is not using an LLM. It is using learned representations as features in a conventional pipeline.

The monitoring deepens too. Drift detection told you when inputs changed. Now you need to know if the recommendations are actually correct -- and that answer may not arrive for weeks. Your monitoring dashboard serves people who did not build the model.

The hard part is not any single technique. It is managing the interactions: features that need versioning, embeddings that need boundary discipline, monitoring that needs to communicate, and a client brief with gaps you have to discover through conversation.

Tools

Python -- scripting, feature engineering, monitoring
DVC -- data and feature versioning (new)
sentence-transformers -- embedding generation (new)
scikit-learn -- modeling, evaluation (familiar)
pandas -- data manipulation (familiar)
MLflow -- experiment tracking (familiar)
Git / GitHub -- version control (familiar)
Claude Code -- AI direction, plan mode (familiar)

Materials

You receive:

Transaction data covering three years of purchases, browsing, wishlists, and returns
Product metadata with descriptions, sustainability certifications, and seasonal tags
Customer profiles with demographics and preferences
A sample of production recommendation events for monitoring work
A DVC pipeline configuration template
A project governance file (CLAUDE.md)