ML P2: Churn Prediction Phase 2 -- Tunde Mobile

Client

Emeka Okafor, Head of Customer Retention at Tunde Mobile (Lagos, Nigeria). Returning client from P1. His team uses the P1 churn model weekly but it misses prepaid customers. His board wants documented methodology.

What you're building

An improved churn prediction model that handles the prepaid/postpaid segment gap, with a full artifact creation pipeline: PRD, evaluation design, documented preprocessing decisions, experiment tracking in MLflow, per-segment evaluation, API serving, and board-facing documentation.

Tech stack

Python 3.11+ (conda ml environment)
pandas (data loading, profiling, preprocessing)
scikit-learn (preprocessing, training, evaluation)
MLflow (experiment tracking)
FastAPI + uvicorn (model serving)
Jupyter (notebook workflow)
Git/GitHub (version control)

File structure

ml/p2/
  materials/
    CLAUDE.md              (this file)
    emeka-followup.md      (client email)
    prd-template.md        (PRD template)
    subscribers-v2.csv     (dataset: ~9,000 rows)
    data-dictionary-v2.md  (column reference)
    tickets.md             (ticket breakdown)
  notebooks/               (Jupyter notebooks -- student creates)
  docs/                    (PRD, eval design, preprocessing decisions, eval results)
  src/                     (model code, serving endpoint)

Tickets

T01: Project setup -- download materials, read CLAUDE.md, profile dataset
T02: Client discovery -- ask Emeka about prepaid behavior and board needs
T03: Data profiling -- compare P2 dataset against P1 baseline
T04: PRD creation -- draft requirements document using template
T05: Evaluation design -- choose metrics, define per-segment evaluation, set baselines
T06: Baseline computation -- majority-class and logistic regression baselines
T07: Per-segment evaluation plan -- define prepaid/postpaid separate metrics
T08: Missing value analysis -- examine distributions before choosing imputation
T09: Encoding decisions -- determine nominal vs ordinal for each categorical
T10: Implement preprocessing -- imputation, encoding, scaling
T11: AI self-review -- prompt Claude to verify preprocessing pipeline
T12: Stratified split -- preserve class and segment distributions
T13: Document preprocessing decisions -- rationale for each choice
T14: MLflow setup -- configure experiment tracking
T15: Train baseline logistic regression -- log to MLflow
T16: Train RandomForest -- log to MLflow
T17: Hyperparameter tuning -- cross-validation with fold variance check
T18: Per-segment evaluation -- compute metrics for prepaid and postpaid separately
T19: Experiment comparison -- use MLflow to compare runs
T20: Serve best model -- FastAPI endpoint
T21: Evaluation documentation -- board-facing results summary
T22: Update PRD -- compare actual results against planned criteria
T23: Client documentation review -- send to Emeka for board readiness feedback
T24: Write README
T25: Final commit and project close

Verification targets

PRD includes prepaid gap, evaluation metrics with rationale, board-facing success criteria
Evaluation design has per-segment metrics and baseline scores
Preprocessing decisions document has at least three choices with rationale
Stratified split preserves churn class proportion within 1 percentage point
MLflow has at least two experiment runs with logged parameters and metrics
Per-segment evaluation shows prepaid recall separately from postpaid recall
API endpoint returns JSON with churn probability for valid requests
Evaluation documentation includes per-segment results and baseline comparisons
Repository contains all pipeline artifacts, committed

Commit convention

Commit after completing each ticket or logical group of tickets. Use descriptive messages: "Add PRD with evaluation criteria", "Implement preprocessing with documented decisions", "Train models with MLflow tracking".