Learn by Directing AI
All materials

eval-template.md

Evaluation Design

Prediction Target

What are you predicting? What format does the prediction take (a number, a category, a ranking)? What does the client need this prediction for?

Fill in: describe what the model predicts and why it matters for the client's business.

Business Context

What business decision does this prediction inform? What happens if the prediction is wrong? How wrong is acceptable?

Fill in: connect the prediction to the client's specific business problem and the cost of errors.

Metric Selection

Which metrics will you use to evaluate the model? For each metric:

  • What does it measure?
  • Why is it appropriate for this problem?
  • What would a misleading metric look like for this problem?

Fill in: select metrics and justify each one against the business context. Consider: MAE (average error in the same units as the prediction), RMSE (penalizes large errors more), R-squared (proportion of variance explained).

Threshold Justification

What performance level makes the model useful? How does this compare to the current method? What's the minimum acceptable performance?

Fill in: set a concrete threshold and justify it against the client's current approach.

Splitting Strategy

How will you split the data into training and test sets? Why this approach? What would be wrong with an alternative approach?

Fill in: describe and justify the splitting strategy. Consider the structure of the data.

Baseline Definition

What is the simplest prediction method you'll compare against? Why this baseline?

Fill in: define a baseline that represents the "do nothing differently" option.