Ticket Breakdown -- ML P2: Churn Prediction Phase 2

Unit 1: The Brief and the Plan

T01: Project setup Download and extract project materials. Read CLAUDE.md. Verify all materials are present. Acceptance: Materials directory contains CLAUDE.md, emeka-followup.md, prd-template.md, subscribers-v2.csv, data-dictionary-v2.md, and this file.

T02: Client discovery Read Emeka's follow-up email. Open the chat with Emeka and ask about prepaid customer behavior and board expectations. Acceptance: Emeka has confirmed the prepaid behavior details and board documentation needs.

T03: Data profiling Load subscribers-v2.csv and generate a full data profile. Compare against the P1 dataset: row count, column changes, churn distribution overall and by segment. Acceptance: Data profile complete. Segment-level churn rates documented (overall, prepaid, postpaid).

T04: PRD creation Using prd-template.md, draft a PRD that captures: the prepaid churn gap, selected evaluation metrics with rationale, success criteria tied to business needs, and the board documentation requirement. Acceptance: PRD document exists with all sections filled. Problem statement names the prepaid gap. Success criteria include specific metric thresholds.

Unit 2: Evaluation Design

T05: Evaluation metric selection Choose primary and secondary evaluation metrics for this problem. Document the rationale for each choice, including why accuracy is inappropriate for this dataset. Acceptance: Metric choices documented with rationale referencing class imbalance and the prepaid problem.

T06: Baseline computation Compute majority-class predictor performance and logistic regression baseline performance. Log both to establish the floor. Acceptance: Baseline numbers documented: majority-class accuracy and recall, logistic regression overall and per-segment metrics.

T07: Per-segment evaluation plan Design evaluation that reports metrics separately for prepaid and postpaid segments. Define what "improvement" means for each segment. Acceptance: Evaluation design document exists with per-segment evaluation plan and baseline comparison.

Unit 3: Data Preparation

T08: Missing value analysis Examine distributions of columns with missing values (monthly_charges, total_charges, complaints_count). Determine appropriate imputation strategy for each based on the distribution shape. Acceptance: Distribution analysis complete for each column with missing values. Imputation strategy chosen with rationale.

T09: Encoding decisions For each categorical column (plan_type, payment_method, contract_type, segment), determine whether the variable is nominal or ordinal and choose the appropriate encoding. Acceptance: Encoding decision documented for each categorical column with rationale.

T10: Implement preprocessing Direct AI to implement the chosen imputation, encoding, and scaling strategies. Apply to the dataset. Acceptance: Preprocessing code runs without errors. All missing values handled, categoricals encoded, numerics scaled.

T11: AI self-review of preprocessing Prompt Claude to review the preprocessing pipeline: "List every transformation applied before the train/test split and confirm none of them use information from the test set." Acceptance: Self-review output documents each transformation and its data leakage safety status.

T12: Stratified split Perform a stratified train/test split preserving both churn class distribution and segment distribution. Acceptance: Train and test sets exist. Churn proportion within 1 percentage point of original in both sets.

T13: Document preprocessing decisions Write a preprocessing decisions document explaining each encoding, imputation, and scaling choice with rationale. Acceptance: Document exists with at least three specific choices and rationales.

Unit 4: Training and Experiments

T14: MLflow setup Configure MLflow experiment tracking for this project. Set up experiment name, tracking URI, and artifact logging. Acceptance: MLflow experiment created. A test run can be logged and viewed.

T15: Train baseline logistic regression Train a logistic regression model. Log parameters, metrics (overall and per-segment), and model artifact to MLflow. Acceptance: MLflow shows a logged run with parameters and metrics for logistic regression.

T16: Train RandomForest Train a RandomForest classifier. Log parameters, metrics (overall and per-segment), and model artifact to MLflow. Acceptance: MLflow shows a logged run with parameters and metrics for RandomForest.

T17: Hyperparameter tuning Tune RandomForest hyperparameters using cross-validation. Check whether fold variance exceeds the hyperparameter effect. Log the best configuration to MLflow. Acceptance: Tuning results logged. Fold variance documented alongside mean scores.

T18: Per-segment evaluation Evaluate the best model separately on prepaid and postpaid customers. Report recall, precision, F1 for each segment. Acceptance: Per-segment metrics computed and documented. Prepaid recall reported separately.

T19: Experiment comparison Use MLflow to compare all logged runs. Identify the best model for the prepaid problem. Acceptance: Comparison documented with rationale for model selection.

Unit 5: Serving and Documentation

T20: Serve best model Build a FastAPI endpoint serving the best model. Accept subscriber features as JSON, return churn probability and binary prediction. Test with curl. Acceptance: Endpoint returns 200 with churn probability for valid requests.

T21: Evaluation documentation Write board-facing evaluation results documentation. Include: executive summary, approach, per-segment results in business language, baseline comparisons, recommendations. Acceptance: Documentation uses business language (not raw metric names). Per-segment results translated for non-technical readers.

T22: Update PRD with actual results Compare actual model performance against the success criteria defined in the PRD. Note where criteria were met and where they fell short. Acceptance: PRD updated with actual results section.

T23: Client documentation review Send evaluation documentation to Emeka for board readiness review. Incorporate his feedback on language and presentation. Acceptance: Emeka has reviewed and approved the documentation for board presentation.

Unit 6: Project Close

T24: Write README Create a project README describing what was built, the approach, and how to run the system. Acceptance: README exists with project description, setup instructions, and usage.

T25: Final commit and project close Ensure all project files are committed with clear commit messages. Verify all artifacts are present in the repository. Acceptance: All files committed. Repository contains: PRD, evaluation design, preprocessing decisions, evaluation documentation, model code, serving endpoint, README.