Tickets: Coffee Yield Prediction
Group 1: Data Profiling and Understanding
T-01: Profile sensor dataset structure and quality
Profile the sensor-data.csv file. Understand: how many farms, what date range, what readings are collected, what the temporal structure looks like (daily readings, seasonal patterns). Identify any data quality issues: missing readings, gaps in coverage, unexpected values.
Acceptance criteria:
- Farm count, date range, and column summary documented
- Sensor gaps identified with which farms and which time periods
- Temporal structure described (daily readings, seasonal patterns)
T-02: Profile harvest records and link to sensor data
Profile harvest-records.csv. Understand: how many harvest periods, yield ranges per farm, any patterns across farms or seasons. Link the harvest data to the sensor data -- which sensor readings correspond to which harvest.
Acceptance criteria:
- Harvest period count and yield ranges documented
- Relationship between sensor data timeline and harvest periods understood
- Any farm-level patterns or anomalies noted (e.g., variety differences)
Group 2: Feature Engineering and Leakage Prevention
T-03: Design feature aggregation from daily readings to harvest-level features
The raw sensor data is daily. The prediction target is per-harvest yield. Design the aggregation: which statistics to compute over which time windows. Consider growing season timing (flowering Oct-Nov, cherry development Dec-Mar in Huila).
Acceptance criteria:
- Feature list with aggregation method (mean, sum, min, max, std) and time window for each
- Rationale for each feature connecting to coffee agriculture domain knowledge
- Derived features (interaction terms, polynomial features) proposed with domain reasoning
T-04: Build feature engineering pipeline with temporal awareness
Implement the feature aggregation. Merge sensor features with harvest targets. Handle sensor gaps (imputation or flagging). Ensure the pipeline respects temporal ordering.
Acceptance criteria:
- Feature pipeline produces one row per farm per harvest with all engineered features
- Sensor gaps handled with documented approach
- Pipeline code is readable and comments explain domain reasoning
T-05: Implement temporal train/test split
Split the data temporally: train on earlier harvests, test on the most recent harvest. Do not use random splitting -- the data has time ordering that random splits violate.
Acceptance criteria:
- Train set contains harvests 1-3, test set contains harvest 4
- No random shuffling of temporal data
- Split documented with rationale for the temporal boundary
T-06: Verify no preprocessing leakage
Verify that all preprocessing (scaling, encoding, feature transformations) happens after the split, not before. Alternatively, use a scikit-learn Pipeline that enforces correct ordering. Run the pipeline and confirm metrics are honest (not inflated by leakage).
Acceptance criteria:
- No preprocessing step uses information from the test set
- Preprocessing either happens after split or inside a scikit-learn Pipeline
- Metrics are honest: comparable to or lower than a simple baseline, not suspiciously high
Group 3: Outlier Handling and Documentation
T-07: Identify and classify outliers with domain reasoning
Examine outliers in the feature data and yield records. For each outlier, determine: is it a variety effect (Gesha farms), a sensor anomaly, or a genuinely exceptional harvest? Do not apply a statistical formula blindly.
Acceptance criteria:
- Outliers identified with specific farm IDs and harvest periods
- Each outlier classified as variety effect, sensor anomaly, or genuine variation
- Classification justified with domain reasoning, not just statistical thresholds
T-08: Handle outliers based on domain judgment
Keep outliers that represent real data (variety effects, genuine variation). Flag or handle sensor anomalies appropriately. Document the handling decision for each case.
Acceptance criteria:
- Gesha farm data kept (real variety effect, not noise)
- Sensor anomalies flagged or imputed with documented approach
- No statistical outlier removal applied without domain justification
T-09: Document all feature construction decisions
Produce a feature documentation artifact. For each derived feature: what it is, the domain hypothesis behind it, and the expected relationship. For outlier handling: what was kept, what was handled, and why.
Acceptance criteria:
- Every derived feature has a documented rationale
- Outlier handling decisions documented with domain reasoning
- Documentation is auditable: another practitioner can understand the pipeline
Group 4: PyTorch Training and Baseline Comparison
T-10: Set up PyTorch model architecture for yield regression
Define a simple neural network for regression (predicting yield_kg). Select appropriate loss function (MSELoss for regression). Prepare data as PyTorch tensors.
Acceptance criteria:
- Neural network architecture defined with input/output dimensions matching data
- MSELoss selected and justified (regression task)
- Data converted to tensors correctly
T-11: Implement training loop with loss curve monitoring
Write the PyTorch training loop with explicit gradient management. Track training loss and validation loss per epoch. Plot loss curves.
Acceptance criteria:
- Training loop zeros gradients before the forward pass (not after optimizer step)
- Both training and validation loss tracked per epoch
- Loss curves plotted showing training progress
T-12: Configure early stopping with appropriate patience
Add early stopping to prevent overfitting. Monitor validation loss. Set patience that allows learning without excessive overfitting.
Acceptance criteria:
- Early stopping monitors validation loss
- Patience is justified (not too high, not too low)
- Training stops when validation loss stops improving
T-13: Run baseline comparison with MLflow tracking
Train a scikit-learn baseline model (e.g., RandomForest or LinearRegression) on the same temporal split. Log both baseline and PyTorch runs in MLflow with parameters, metrics, and artifacts. Compare.
Acceptance criteria:
- Scikit-learn baseline trained on same temporal split
- Both runs logged in MLflow with all parameters and metrics
- Comparison documented: which model performs better and by how much
Group 5: Predictions and Delivery
T-14: Generate per-farm yield predictions
Use the best model (based on honest temporal test metrics) to predict yield for each farm for the upcoming harvest. Format predictions as a readable table.
Acceptance criteria:
- Predictions generated for all 12 farms
- Predictions use the model with the best honest test metrics
- Results formatted as a readable table with farm ID, predicted yield, and confidence indicator
T-15: Deliver predictions to Valentina in business terms
Translate model predictions into contract decisions. Flag underperforming farms. Explain key findings about weather patterns and yield drivers. Handle Valentina's scope request about quality prediction.
Acceptance criteria:
- Predictions delivered in business language Valentina can use
- Underperforming farms flagged with explanation
- Scope request (quality scores) managed with clear reasoning
T-16: Write README and close project
Write a project README covering: what was built, the data pipeline, the models, how to rerun predictions with new data. Final commit.
Acceptance criteria:
- README covers project purpose, data pipeline, models, and rerun instructions
- All files committed with descriptive commit messages
- Repository is complete and clean