Learn by Directing AI
All materials

CLAUDE.md

ML P2: Churn Prediction Phase 2 -- Tunde Mobile

Client

Emeka Okafor, Head of Customer Retention at Tunde Mobile (Lagos, Nigeria). Returning client from P1. His team uses the P1 churn model weekly but it misses prepaid customers. His board wants documented methodology.

What you're building

An improved churn prediction model that handles the prepaid/postpaid segment gap, with a full artifact creation pipeline: PRD, evaluation design, documented preprocessing decisions, experiment tracking in MLflow, per-segment evaluation, API serving, and board-facing documentation.

Tech stack

  • Python 3.11+ (conda ml environment)
  • pandas (data loading, profiling, preprocessing)
  • scikit-learn (preprocessing, training, evaluation)
  • MLflow (experiment tracking)
  • FastAPI + uvicorn (model serving)
  • Jupyter (notebook workflow)
  • Git/GitHub (version control)

File structure

ml/p2/
  materials/
    CLAUDE.md              (this file)
    emeka-followup.md      (client email)
    prd-template.md        (PRD template)
    subscribers-v2.csv     (dataset: ~9,000 rows)
    data-dictionary-v2.md  (column reference)
    tickets.md             (ticket breakdown)
  notebooks/               (Jupyter notebooks -- student creates)
  docs/                    (PRD, eval design, preprocessing decisions, eval results)
  src/                     (model code, serving endpoint)

Tickets

  1. T01: Project setup -- download materials, read CLAUDE.md, profile dataset
  2. T02: Client discovery -- ask Emeka about prepaid behavior and board needs
  3. T03: Data profiling -- compare P2 dataset against P1 baseline
  4. T04: PRD creation -- draft requirements document using template
  5. T05: Evaluation design -- choose metrics, define per-segment evaluation, set baselines
  6. T06: Baseline computation -- majority-class and logistic regression baselines
  7. T07: Per-segment evaluation plan -- define prepaid/postpaid separate metrics
  8. T08: Missing value analysis -- examine distributions before choosing imputation
  9. T09: Encoding decisions -- determine nominal vs ordinal for each categorical
  10. T10: Implement preprocessing -- imputation, encoding, scaling
  11. T11: AI self-review -- prompt Claude to verify preprocessing pipeline
  12. T12: Stratified split -- preserve class and segment distributions
  13. T13: Document preprocessing decisions -- rationale for each choice
  14. T14: MLflow setup -- configure experiment tracking
  15. T15: Train baseline logistic regression -- log to MLflow
  16. T16: Train RandomForest -- log to MLflow
  17. T17: Hyperparameter tuning -- cross-validation with fold variance check
  18. T18: Per-segment evaluation -- compute metrics for prepaid and postpaid separately
  19. T19: Experiment comparison -- use MLflow to compare runs
  20. T20: Serve best model -- FastAPI endpoint
  21. T21: Evaluation documentation -- board-facing results summary
  22. T22: Update PRD -- compare actual results against planned criteria
  23. T23: Client documentation review -- send to Emeka for board readiness feedback
  24. T24: Write README
  25. T25: Final commit and project close

Verification targets

  • PRD includes prepaid gap, evaluation metrics with rationale, board-facing success criteria
  • Evaluation design has per-segment metrics and baseline scores
  • Preprocessing decisions document has at least three choices with rationale
  • Stratified split preserves churn class proportion within 1 percentage point
  • MLflow has at least two experiment runs with logged parameters and metrics
  • Per-segment evaluation shows prepaid recall separately from postpaid recall
  • API endpoint returns JSON with churn probability for valid requests
  • Evaluation documentation includes per-segment results and baseline comparisons
  • Repository contains all pipeline artifacts, committed

Commit convention

Commit after completing each ticket or logical group of tickets. Use descriptive messages: "Add PRD with evaluation criteria", "Implement preprocessing with documented decisions", "Train models with MLflow tracking".