Learn by Directing AI
All materials

CLAUDE.md

Tunde Mobile Churn Prediction

Build a churn prediction model and serve it as an API for Emeka Okafor's retention team at Tunde Mobile, Lagos.

Client

  • Emeka Okafor — Head of Customer Retention, Tunde Mobile (MVNO, Lagos)
  • Wants: churn predictions ranked by risk, feature importance, API endpoint his team queries weekly
  • 200,000 subscribers, losing 2-3% per month
  • Has 12 months of billing/subscriber data

Stack

  • Python 3.11+ (conda environment: ml)
  • pandas — data loading, profiling, preprocessing
  • scikit-learn — RandomForestClassifier, preprocessing, evaluation
  • Jupyter — notebook workflow
  • FastAPI + uvicorn — model serving
  • joblib — model serialization
  • curl — API testing
  • Git/GitHub — version control

File structure

materials/
  emeka-brief.md        — Emeka's email brief (the project's starting point)
  data-dictionary.md    — Column-level documentation for subscribers.csv
  subscribers.csv       — 7,043-row subscriber dataset (12 months ending March 2025)
  tickets.md            — Work breakdown: T1-T12
  CLAUDE.md             — This file
  images/               — Project images (populated during authoring)
  scripts/              — Generation scripts

Working files (notebooks, scripts, model artifacts, logs) go in the project root as you create them.

Tickets

  • T1: Read and summarize the brief
  • T2: Profile the dataset
  • T3: Review data dictionary against profile
  • T4: Impute missing values
  • T5: Encode categorical features
  • T6: Scale features and stratified train/test split
  • T7: Train RandomForestClassifier
  • T8: Evaluate model (confusion matrix, classification report)
  • T9: Extract feature importances
  • T10: Build FastAPI endpoint
  • T11: Test endpoint (valid + invalid input)
  • T12: Add request logging

Evaluation targets

  • Churn class recall >= 0.55 on the test set
  • Stratified split preserving ~8% churn rate in both train and test sets
  • API returns HTTP 200 with a probability between 0 and 1 for valid requests
  • Confusion matrix and classification report generated on test set
  • Feature importance ranking produced
  • Endpoint handles invalid input gracefully

Commit convention

  • Meaningful commit messages in imperative mood
  • One commit per logical unit of work
  • Examples:
    • "Add data profiling notebook"
    • "Train RandomForest with balanced class weights"
    • "Build FastAPI churn prediction endpoint"
    • "Add request logging to prediction API"