Learn by Directing AI
Unit 4

Better models and honest comparison

Step 1: Build a decision tree classifier

Direct AI to fit a decision tree classifier on the same training data. Get the confusion matrix and precision/recall for the Reserve class. Compare with the logistic regression results from Unit 3.

A decision tree works differently from logistic regression. It learns rules: "if altitude > 1100m AND aging > 18 months AND fermentation_temp < 25C, predict Reserve." It can capture non-linear patterns that logistic regression misses.

Step 2: Compare models using correct metrics

Not accuracy. Compare the two models on precision, recall, and F1 for the Reserve class.

Which model catches more Reserve barrels (higher recall)? Which has fewer false alarms (higher precision)? The comparison is multi-dimensional. One model might catch more Reserve barrels but also flag more false alarms. The "better" model depends on what Luciana is willing to trade.

Step 3: ROC curves and AUC

Both models output a probability for each barrel. The default threshold is 0.5 -- above it, predict Reserve; below it, predict Standard. But that threshold is arbitrary.

The ROC curve shows how the trade-off between true positive rate (catching Reserve) and false positive rate (flagging Standard as Reserve) changes across all possible thresholds. AUC (Area Under the Curve) summarizes discrimination ability in a single number -- how well the model separates the two classes regardless of threshold choice.

Direct AI to plot ROC curves for both models on the same axes with AUC values in the legend. A model that perfectly separates Reserve from Standard has AUC = 1.0. A model that guesses randomly has AUC = 0.5.

Step 4: Threshold tuning

The default 0.5 probability threshold does not serve Luciana's priorities. She wants high recall -- catch the Reserve barrels even at the cost of some false alarms. A lower threshold (predict Reserve for any barrel with probability above 0.3, say) will catch more true Reserves but also flag more Standard barrels.

Direct AI to show how precision and recall change at different thresholds. Find a threshold that matches Luciana's stated preference: high recall, acceptable precision. There is no single right answer -- this is a trade-off you are making on behalf of the client based on what she told you.

Step 5: The interpretability trade-off

Luciana asked for "something I can explain to my export partners." Logistic regression produces coefficients: "each degree of fermentation temperature above 28C reduces Reserve probability by X%." Decision trees produce rules: "if altitude > 1100m and aging > 18 months, predict Reserve." Both are explainable, but in different ways.

AI defaults to whichever model produces the best metric on the training data. It does not consider whether the client needs to understand how the model works. That consideration is yours.

Which model best balances performance and explanability for Luciana's audience?

Step 6: Select and justify

Choose the model and threshold combination that best serves Luciana. Document in the methodology memo:

  • Which model you selected and why
  • What threshold you chose and the precision-recall trade-off at that threshold
  • How the model's outputs can be explained to export partners

Cross-check your methodology against the brief: does the metric choice match Luciana's priorities? Does the threshold serve her stated preference? Is the model interpretable enough for her audience?

✓ Check

Check: Comparison uses precision/recall/AUC. ROC curves plotted. Threshold adjusted. Model selection justified with interpretability.