Tickets: Coffee Yield Prediction

Group 1: Data Profiling and Understanding

T-01: Profile sensor dataset structure and quality

Profile the sensor-data.csv file. Understand: how many farms, what date range, what readings are collected, what the temporal structure looks like (daily readings, seasonal patterns). Identify any data quality issues: missing readings, gaps in coverage, unexpected values.

Acceptance criteria:

Farm count, date range, and column summary documented
Sensor gaps identified with which farms and which time periods
Temporal structure described (daily readings, seasonal patterns)

T-02: Profile harvest records and link to sensor data

Profile harvest-records.csv. Understand: how many harvest periods, yield ranges per farm, any patterns across farms or seasons. Link the harvest data to the sensor data -- which sensor readings correspond to which harvest.

Acceptance criteria:

Harvest period count and yield ranges documented
Relationship between sensor data timeline and harvest periods understood
Any farm-level patterns or anomalies noted (e.g., variety differences)

Group 2: Feature Engineering and Leakage Prevention

T-03: Design feature aggregation from daily readings to harvest-level features

The raw sensor data is daily. The prediction target is per-harvest yield. Design the aggregation: which statistics to compute over which time windows. Consider growing season timing (flowering Oct-Nov, cherry development Dec-Mar in Huila).

Acceptance criteria:

Feature list with aggregation method (mean, sum, min, max, std) and time window for each
Rationale for each feature connecting to coffee agriculture domain knowledge
Derived features (interaction terms, polynomial features) proposed with domain reasoning

T-04: Build feature engineering pipeline with temporal awareness

Implement the feature aggregation. Merge sensor features with harvest targets. Handle sensor gaps (imputation or flagging). Ensure the pipeline respects temporal ordering.

Acceptance criteria:

Feature pipeline produces one row per farm per harvest with all engineered features
Sensor gaps handled with documented approach
Pipeline code is readable and comments explain domain reasoning

T-05: Implement temporal train/test split

Split the data temporally: train on earlier harvests, test on the most recent harvest. Do not use random splitting -- the data has time ordering that random splits violate.

Acceptance criteria:

Train set contains harvests 1-3, test set contains harvest 4
No random shuffling of temporal data
Split documented with rationale for the temporal boundary

T-06: Verify no preprocessing leakage

Verify that all preprocessing (scaling, encoding, feature transformations) happens after the split, not before. Alternatively, use a scikit-learn Pipeline that enforces correct ordering. Run the pipeline and confirm metrics are honest (not inflated by leakage).

Acceptance criteria:

No preprocessing step uses information from the test set
Preprocessing either happens after split or inside a scikit-learn Pipeline
Metrics are honest: comparable to or lower than a simple baseline, not suspiciously high

Group 3: Outlier Handling and Documentation

T-07: Identify and classify outliers with domain reasoning

Examine outliers in the feature data and yield records. For each outlier, determine: is it a variety effect (Gesha farms), a sensor anomaly, or a genuinely exceptional harvest? Do not apply a statistical formula blindly.

Acceptance criteria:

Outliers identified with specific farm IDs and harvest periods
Each outlier classified as variety effect, sensor anomaly, or genuine variation
Classification justified with domain reasoning, not just statistical thresholds

T-08: Handle outliers based on domain judgment

Keep outliers that represent real data (variety effects, genuine variation). Flag or handle sensor anomalies appropriately. Document the handling decision for each case.

Acceptance criteria:

Gesha farm data kept (real variety effect, not noise)
Sensor anomalies flagged or imputed with documented approach
No statistical outlier removal applied without domain justification

T-09: Document all feature construction decisions

Produce a feature documentation artifact. For each derived feature: what it is, the domain hypothesis behind it, and the expected relationship. For outlier handling: what was kept, what was handled, and why.

Acceptance criteria:

Every derived feature has a documented rationale
Outlier handling decisions documented with domain reasoning
Documentation is auditable: another practitioner can understand the pipeline

Group 4: PyTorch Training and Baseline Comparison

T-10: Set up PyTorch model architecture for yield regression

Define a simple neural network for regression (predicting yield_kg). Select appropriate loss function (MSELoss for regression). Prepare data as PyTorch tensors.

Acceptance criteria:

Neural network architecture defined with input/output dimensions matching data
MSELoss selected and justified (regression task)
Data converted to tensors correctly

T-11: Implement training loop with loss curve monitoring

Write the PyTorch training loop with explicit gradient management. Track training loss and validation loss per epoch. Plot loss curves.

Acceptance criteria:

Training loop zeros gradients before the forward pass (not after optimizer step)
Both training and validation loss tracked per epoch
Loss curves plotted showing training progress

T-12: Configure early stopping with appropriate patience

Add early stopping to prevent overfitting. Monitor validation loss. Set patience that allows learning without excessive overfitting.

Acceptance criteria:

Early stopping monitors validation loss
Patience is justified (not too high, not too low)
Training stops when validation loss stops improving

T-13: Run baseline comparison with MLflow tracking

Train a scikit-learn baseline model (e.g., RandomForest or LinearRegression) on the same temporal split. Log both baseline and PyTorch runs in MLflow with parameters, metrics, and artifacts. Compare.

Acceptance criteria:

Scikit-learn baseline trained on same temporal split
Both runs logged in MLflow with all parameters and metrics
Comparison documented: which model performs better and by how much

Group 5: Predictions and Delivery

T-14: Generate per-farm yield predictions

Use the best model (based on honest temporal test metrics) to predict yield for each farm for the upcoming harvest. Format predictions as a readable table.

Acceptance criteria:

Predictions generated for all 12 farms
Predictions use the model with the best honest test metrics
Results formatted as a readable table with farm ID, predicted yield, and confidence indicator

T-15: Deliver predictions to Valentina in business terms

Translate model predictions into contract decisions. Flag underperforming farms. Explain key findings about weather patterns and yield drivers. Handle Valentina's scope request about quality prediction.

Acceptance criteria:

Predictions delivered in business language Valentina can use
Underperforming farms flagged with explanation
Scope request (quality scores) managed with clear reasoning

T-16: Write README and close project

Write a project README covering: what was built, the data pipeline, the models, how to rerun predictions with new data. Final commit.

Acceptance criteria:

README covers project purpose, data pipeline, models, and rerun instructions
All files committed with descriptive commit messages
Repository is complete and clean

tickets.md

Tickets: Coffee Yield Prediction

Group 1: Data Profiling and Understanding

T-01: Profile sensor dataset structure and quality

T-02: Profile harvest records and link to sensor data

Group 2: Feature Engineering and Leakage Prevention

T-03: Design feature aggregation from daily readings to harvest-level features

T-04: Build feature engineering pipeline with temporal awareness

T-05: Implement temporal train/test split

T-06: Verify no preprocessing leakage

Group 3: Outlier Handling and Documentation

T-07: Identify and classify outliers with domain reasoning

T-08: Handle outliers based on domain judgment

T-09: Document all feature construction decisions

Group 4: PyTorch Training and Baseline Comparison

T-10: Set up PyTorch model architecture for yield regression

T-11: Implement training loop with loss curve monitoring

T-12: Configure early stopping with appropriate patience

T-13: Run baseline comparison with MLflow tracking

Group 5: Predictions and Delivery

T-14: Generate per-farm yield predictions

T-15: Deliver predictions to Valentina in business terms

T-16: Write README and close project