Tickets — Tunde Mobile Churn Prediction
Unit 1: The Brief and the Data
T1: Read and summarize the brief
Load emeka-brief.md and confirm understanding of requirements.
AC: Can state what Emeka needs in one sentence.
T2: Profile the dataset
Load subscribers.csv and print shape, column types, summary statistics, missing values, and class distribution.
AC: Profile output shows row count, column count, types, and churn class split.
T3: Review data dictionary
Compare profile output against data-dictionary.md.
AC: Confirm columns match, types are correct, no unexpected values.
Unit 2: Preprocessing and Splitting
T4: Impute missing values
Handle missing values with appropriate strategy per column. AC: No nulls remaining. Imputation strategy documented.
T5: Encode categorical features
Apply one-hot or ordinal encoding as appropriate. AC: All features numeric. Encoding choices documented.
T6: Scale and split
Scale numerical features and perform stratified train/test split (80/20, random_state=42).
AC: Train and test sets exist. Churn proportion within 1 percentage point of original in both sets.
Unit 3: Training and Evaluation
T7: Train the model
Train RandomForestClassifier with class_weight='balanced' and random_state=42.
AC: Model object exists, training completes without error.
T8: Evaluate the model
Generate confusion matrix and classification report on the test set. AC: Churn class recall >= 0.55.
T9: Extract feature importances
Print ranked feature importances from the trained model. AC: Feature importance list produced, top features identified.
Unit 4: Serving the Model
T10: Build the API endpoint
Create a FastAPI app that loads the trained model, accepts subscriber features as JSON, and returns churn probability and binary prediction. AC: Server starts, responds to valid requests with probability between 0 and 1.
T11: Test the endpoint
Test with valid input (curl), test with missing features, test with wrong types. AC: Valid input returns 200 with probability. Invalid input returns appropriate error.
T12: Add request logging
Log each prediction request and response with timestamp. AC: Log file records predictions after curl requests.
Unit 5: Project Close
No tickets. Unit 5 covers committing to Git, pushing to GitHub, writing the README, and delivering results to Emeka.