Learn by Directing AI
Unit 4

PyTorch Training Loop

Step 1: The Conceptual Shift

Up to now, training a model meant calling .fit(). Scikit-learn handled everything internally -- you passed in data and got a trained model back. You didn't need to know what happened inside.

PyTorch works differently. You write the training loop yourself: forward pass (run the data through the model), loss computation (measure how wrong the predictions are), backward pass (compute gradients), gradient update (adjust the weights), and gradient zeroing (clear the gradients for the next iteration). Each step has a purpose. Skip or misorder any step and the model trains without learning.

This is a conceptual shift, not just a complexity increase. Understanding what each step does is what lets you verify whether AI-generated training code is correct.

Step 2: Set Up the PyTorch Pipeline

Direct Claude to convert your engineered features into PyTorch tensors and define a simple neural network for regression. The model predicts yield (a continuous number), so the output layer has one neuron and the loss function should be MSELoss -- mean squared error for regression.

Review what Claude produces. The architecture should make sense: input dimensions matching your feature count, one or two hidden layers, ReLU activations, and a single output for the predicted yield. The loss function matters -- if Claude selects a classification loss (like CrossEntropyLoss) on a regression problem, the model will train toward the wrong objective. MSELoss is the right choice here because you're predicting a continuous value.

Step 3: Write the Training Loop

Direct Claude to write the training loop. Review the code carefully.

AI commonly gets the gradient management mechanics wrong in ways that are invisible in the syntax but visible in the results. The key question: where does optimizer.zero_grad() appear? It should come before the forward pass, not after the optimizer step. If gradients aren't zeroed before each iteration, they accumulate across batches and the model converges to a poor solution.

Check that the loop tracks both training loss and validation loss at each epoch. You need both to see what's happening during training.

Step 4: Train and Monitor Loss Curves

Run the training. Direct Claude to plot the training and validation loss curves.

Watch what happens over epochs. Training loss should decrease -- the model is learning the training data. Validation loss should also decrease initially -- the model is learning the underlying pattern. But at some point, validation loss may start increasing while training loss keeps going down. That divergence is overfitting: the model is memorizing training examples instead of learning the pattern that generalizes.

Early stopping is the guard. It monitors validation loss and stops training when the loss stops improving. Direct Claude to add early stopping. The patience parameter matters -- too high and the model overfits before stopping, too low and it stops before the model has learned enough. Ask Claude to justify the patience value it chooses.

If training stops at epoch 15 because validation loss started rising, that's a meaningful decision you can explain: "The model learned the yield pattern from the sensor data in 15 epochs. Training longer would have memorized the specific farms instead of learning the weather-to-yield relationship."

Step 5: MLflow Experiment Tracking

Log the PyTorch training run to MLflow. Track the hyperparameters (learning rate, number of epochs, architecture), the training and validation metrics, and the model artifact.

Neural network training is less deterministic than scikit-learn. Even with the same code and data, two runs can produce slightly different results because of non-deterministic GPU operations, data loader shuffling, and floating-point accumulation order. Set seeds for PyTorch, numpy, and Python's random module. Note that even with seeds, some variation may remain. That's why logging to MLflow matters -- without it, "which run produced the good result?" is unanswerable.

Step 6: Baseline Comparison

Train a scikit-learn baseline on the same temporal split. A RandomForestRegressor or LinearRegression gives you a reference point. Log this run to MLflow alongside the PyTorch run.

Compare the two in MLflow. The PyTorch model isn't automatically better -- on a small agricultural dataset with 48 harvest records and limited features, a simpler model might match or beat a neural network. The comparison tells you whether the added complexity of PyTorch provides real value for Valentina's prediction problem.

If the scikit-learn baseline performs similarly, that's a finding worth documenting. Complexity without improvement is a cost, not a benefit.

✓ Check

Check: The PyTorch training loop zeros gradients before the forward pass. Training and validation loss curves are plotted. Early stopping is configured. Both PyTorch and scikit-learn runs are logged in MLflow with comparable metrics.