Learn by Directing AI
All materials

logging-spec-template.md

Logging Spec Template: Prediction Logging Schema

Why log predictions

Every prediction your model makes is a data point you might need later. Without prediction logs, you cannot answer basic questions: What did the model predict last week? How far off were those predictions from actual results? Is the model getting worse over time? Prediction logging is infrastructure -- it creates the raw material for every monitoring, evaluation, and debugging task that follows.

Required fields

Design your logging schema by deciding what to capture for each prediction. The fields below are the minimum. For each, decide the format and any constraints.

Field Purpose Your design
timestamp When the prediction was made. Include timezone -- predictions from different systems in different timezones need to be comparable.
request_id Unique identifier for this prediction request. Enables tracing a specific prediction through the system.
farm_id Which farm this prediction is for. Comes from the input.
input_features The raw input values sent to the model. Captures what the model saw when it made this prediction.
predicted_yield_kg The model's output. The actual prediction value.
confidence_score How confident the model is in this prediction. If your model does not natively produce a confidence score, note what you would use as a proxy or leave a design note.
response_time_ms How long the prediction took from request to response. Performance monitoring -- a sudden spike means something changed.
model_version Which model file served this prediction. When you retrain and deploy a new model, this field tells you which predictions came from which version.

Storage format

Prediction logs should be append-only. Each prediction adds one entry. Two common formats:

JSON Lines (.jsonl): One JSON object per line. Each line is independently parseable. Easy to append. Easy to read with standard tools (cat, jq, python json). Good for structured data with nested fields (like input_features).

CSV: One row per prediction. Columns map to fields. Simple but struggles with nested data (input_features would need to be flattened or serialized).

For predictions with nested input features, JSON Lines is typically the better choice.

Health check design

The /health endpoint answers one question: "Is the system ready to serve predictions?" A health check that returns {"status": "ok"} without checking anything is not a health check -- it is a liveness probe that tells you the process is running, not that it can serve.

What should /health verify? List everything the endpoint depends on that could fail:

  • _________________________________
  • _________________________________
  • _________________________________
  • _________________________________

For each item, decide: what does "healthy" look like? What does "unhealthy" look like? What should the response include when something is unhealthy?

The availability vs correctness distinction

Health monitoring answers "is the system available?" It tells you the process is running, the model is loaded, the dependencies are reachable. It does not tell you "is the system correct?" -- whether the predictions are accurate, whether the model has degraded, whether the input data distribution has shifted.

Prediction logs are the bridge. By logging every prediction with its inputs and outputs, you create the raw material to eventually answer correctness questions. When actual harvest data comes in, you can compare predictions to reality. That comparison is drift detection -- a future concern. For now, the logs are infrastructure: they exist so that future monitoring is possible.