Learn by Directing AI

Step 1: Understand prediction logging

Open materials/logging-spec-template.md.

Every prediction the model makes is a data point you might need later. Without prediction logs, you cannot answer basic questions: what did the model predict last week? How far off were those predictions from actual harvests? Is the model getting worse over time? Prediction logging is infrastructure -- it creates the raw material for every monitoring, evaluation, and debugging task that follows.

The template lists the fields you need to design a logging schema for: timestamp, request_id, farm_id, input_features, predicted_yield_kg, confidence_score, response_time_ms, model_version. Each field has a purpose. Your job is to decide the format and constraints for each.

Step 2: Design the logging schema

Work through the template's fields and fill in your design decisions. Consider: what format should the timestamp use? (ISO 8601 with timezone, so predictions from different locations are comparable.) What goes into the input_features field? (The raw values sent to the model, so you can see exactly what the model saw.) How do you handle confidence_score if the model does not natively produce one?

The template recommends JSON Lines format -- one JSON object per line, append-only. Each line is independently parseable, which makes it easy to read the last few predictions or stream new entries without loading the entire file.

Step 3: Implement prediction logging

Direct Claude to add prediction logging middleware to serve.py:

Add prediction logging to serve.py. Every prediction should be logged to a JSON Lines file at logs/predictions.jsonl. Each log entry must contain: timestamp (ISO 8601 with timezone), request_id (UUID), farm_id, input_features (the raw input values), predicted_yield_kg, confidence_score, and response_time_ms. Use Python's logging module or direct file writes. Create the logs directory if it doesn't exist.

Step 4: Review AI's logging implementation

Read what Claude generated. AI commonly generates logging middleware that captures request and response bodies but omits the metadata that makes logs useful. Check for: timestamps without timezone information, missing confidence scores, missing response time measurement, input features stored as processed tensors instead of raw values.

The template gave you structure. Your design decisions specified what each field should contain. If the implementation doesn't match your schema, that's a gap to close before moving on.

Step 5: Design and implement health checks

The current /health endpoint returns {"status": "healthy"} without checking anything. That tells you the process is running, not that the system is ready to serve predictions.

The logging spec template asks: what should /health verify? Think about everything the endpoint depends on that could fail. The model file needs to be loaded. The feature pipeline needs to be importable. The log directory needs to be writable. A health check that verifies these things tells you "the system is ready." One that just returns "ok" tells you nothing useful.

Direct Claude to update the health endpoint:

Update the /health endpoint in serve.py to verify: the model is loaded and can produce output, the feature pipeline is importable, and the log directory exists and is writable. Return a JSON response with status, model_loaded, and dependency_status fields. If any check fails, return an appropriate error status.

Step 6: Run predictions and verify logging

Rebuild the container with the updated serve.py (or run locally first to verify), then run five predictions:

curl -X POST http://localhost:8000/predict -H "Content-Type: application/json" -d '{"farm_id": "farm_01", "temperature": 22.5, "rainfall": 15.0, "soil_moisture": 45.0, "humidity": 72.0, "altitude": 1650.0}'

Run this five times with different farm IDs or sensor values. Then check the log file:

cat logs/predictions.jsonl

Each entry should contain all the fields from your logging schema: timestamp with timezone, request_id, farm_id, the raw input features, the predicted yield, a confidence score, and the response time in milliseconds. Five predictions should produce five entries.

✓ Check