P3 Tickets: Infrastructure Foundation

Group 1: Input Validation & Error Handling

T-01: Add Pydantic input validation with training data ranges

Add a Pydantic model that validates prediction requests against the training data's actual ranges and types. The validation should encode what the model was trained on -- not arbitrary limits.

Acceptance criteria:

Pydantic model validates all input features
Numeric fields have min/max constraints matching the training data profile (data_profile.json)
Categorical fields accept only values present in the training data
Invalid requests return 422 with a structured error response

T-02: Add structured error responses

Replace default FastAPI error handling with structured JSON error responses. Error responses should tell callers what went wrong and how to fix it -- not expose stack traces or internal paths.

Acceptance criteria:

All error responses return structured JSON with field name, constraint, and received value
No stack traces or internal file paths in any error response
Error messages use plain language (e.g., "field 'tenure_months' must be between 1 and 72")
HTTP status codes are appropriate (422 for validation, 500 for server errors)

T-03: Test validation with valid, boundary, and invalid inputs

Test the validation layer with a range of inputs: clearly valid, boundary values, and clearly invalid.

Acceptance criteria:

Valid request returns 200 with prediction
Boundary request (e.g., tenure_months=1, tenure_months=72) returns 200
Out-of-range request (e.g., tenure_months=-5) returns 422 with structured error
Missing required field returns 422 with structured error
Wrong type (e.g., string where number expected) returns 422 with structured error

Group 2: Health Monitoring & Versioning

T-04: Add health check endpoint that verifies model is loaded

Add a /health endpoint that checks whether the model is actually loaded and functional -- not just whether the server process is running.

Acceptance criteria:

/health endpoint exists at GET /health
Response includes model_loaded status (true/false)
Response includes model_version when loaded
Health check verifies the model can produce a prediction on a reference input
Returns 200 when healthy, 503 when unhealthy

T-05: Add model versioning to prediction responses

Include the model version in every prediction response so that production debugging is possible.

Acceptance criteria:

Every prediction response includes a model_version field
The version comes from model metadata or a version file (not hardcoded)
The version is consistent with the /health endpoint's reported version

T-06: Test health check by simulating model absence

Verify the health check detects real failure states.

Acceptance criteria:

Remove or rename the model file
/health returns 503 with model_loaded: false
/predict returns an appropriate error (not a stack trace)
Restore the model file
/health returns 200 with model_loaded: true

Group 3: Experiment Tracking Infrastructure

T-07: Set up MLflow experiment with structured logging

Configure MLflow to log all experiment parameters, metrics, and model artifacts systematically.

Acceptance criteria:

MLflow experiment created with a descriptive name
All hyperparameters logged as parameters (not hardcoded values)
All evaluation metrics logged (overall and per-segment)
Model artifact logged
Data version or identifier logged
MLflow runs use context managers (with mlflow.start_run())

T-08: Verify MLflow logs actual variable values

Check that every mlflow.log_param and mlflow.log_metric call logs the actual variable, not a hardcoded value.

Acceptance criteria:

Review every log_param call -- each must reference a variable, not a literal
Review every log_metric call -- each must reference a computed metric variable
No log_metric call logs training accuracy as test accuracy
All runs are properly closed (context manager or explicit end_run)

T-09: Run controlled experiment with two model variants

Train two model variants with different hyperparameters and log both to MLflow.

Acceptance criteria:

Two experiments run with different hyperparameters
Both logged to MLflow with distinct run IDs
Parameters differ between runs
Metrics differ between runs
Both have model artifacts logged

T-10: Compare experiments in MLflow UI

Open the MLflow UI and compare the two runs.

Acceptance criteria:

MLflow UI accessible at localhost:5000
Both runs visible in the experiment view
Parameter and metric columns visible for comparison
Per-segment metrics (prepaid_recall, postpaid_recall) visible

Group 4: Reproducibility

T-11: Add random seeds for all random operations

Add random seeds for numpy, scikit-learn, and Python's random module to ensure deterministic execution.

Acceptance criteria:

numpy random seed set
scikit-learn random_state set for all estimators and splitters
Python random seed set
All seeds use the same base value for traceability

T-12: Pin all library versions in requirements.txt

Replace unpinned dependencies with exact version pins.

Acceptance criteria:

Every dependency in requirements.txt has an exact version (e.g., scikit-learn==1.6.1)
Versions are mutually compatible
No dependency uses >= or latest

T-13: Verify reproducibility

Run the same configuration twice and confirm identical results.

Acceptance criteria:

Same config, same seeds, same data produces identical metric values across two runs
MLflow shows two runs with identical metrics
If results differ, identify which component is non-deterministic and fix it

tickets.md

P3 Tickets: Infrastructure Foundation

Group 1: Input Validation & Error Handling

T-01: Add Pydantic input validation with training data ranges

T-02: Add structured error responses

T-03: Test validation with valid, boundary, and invalid inputs

Group 2: Health Monitoring & Versioning

T-04: Add health check endpoint that verifies model is loaded

T-05: Add model versioning to prediction responses

T-06: Test health check by simulating model absence

Group 3: Experiment Tracking Infrastructure

T-07: Set up MLflow experiment with structured logging

T-08: Verify MLflow logs actual variable values

T-09: Run controlled experiment with two model variants

T-10: Compare experiments in MLflow UI

Group 4: Reproducibility

T-11: Add random seeds for all random operations

T-12: Pin all library versions in requirements.txt

T-13: Verify reproducibility