Understanding Training Output

2026-02-10

This is part of a series on ML for generalists, you can find the start here.

This is the output I get from our train.py on my machine, an aging Intel Mac:

% python train.py 
/Users/brian/src/whichway/.venv/lib/python3.12/site-packages/torch/nn/modules/lazy.py:181: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
  warnings.warn('Lazy modules are a new feature under heavy development '
[Epoch 1/10] after 200.1 seconds:
              loss: 0.2828
  within 5 degrees: 45.50%
 within 10 degrees: 75.50%
 within 20 degrees: 94.00%

[Epoch 2/10] after 188.5 seconds:
              loss: 0.0387
  within 5 degrees: 46.00%
 within 10 degrees: 72.50%
 within 20 degrees: 95.50%

[Epoch 3/10] after 188.5 seconds:
              loss: 0.0270
  within 5 degrees: 46.50%
 within 10 degrees: 79.00%
 within 20 degrees: 97.00%

[Epoch 4/10] after 187.3 seconds:
              loss: 0.0200
  within 5 degrees: 54.50%
 within 10 degrees: 84.00%
 within 20 degrees: 98.00%

[Epoch 5/10] after 212.1 seconds:
              loss: 0.0147
  within 5 degrees: 54.00%
 within 10 degrees: 86.50%
 within 20 degrees: 97.50%

[Epoch 6/10] after 218.6 seconds:
              loss: 0.0111
  within 5 degrees: 64.50%
 within 10 degrees: 91.00%
 within 20 degrees: 97.00%

[Epoch 7/10] after 218.8 seconds:
              loss: 0.0094
  within 5 degrees: 63.00%
 within 10 degrees: 90.00%
 within 20 degrees: 98.00%

[Epoch 8/10] after 215.1 seconds:
              loss: 0.0083
  within 5 degrees: 65.50%
 within 10 degrees: 92.00%
 within 20 degrees: 98.50%

[Epoch 9/10] after 210.2 seconds:
              loss: 0.0066
  within 5 degrees: 68.50%
 within 10 degrees: 93.50%
 within 20 degrees: 98.50%

[Epoch 10/10] after 210.6 seconds:
              loss: 0.0054
  within 5 degrees: 70.00%
 within 10 degrees: 93.50%
 within 20 degrees: 98.50%

After our first training epoch, we're already getting within 20° of the right answer for 94% of our images. That's a good start. By epoch 10, loss has dropped to 0.0054 and we're within 5° for 70% of our images. Our model is learning and our numbers are headed in the right direction.

Wait, What's Loss Again?

Loss is a measure of how wrong our model's predictions are. We're using Mean Squared Error (MSE), which penalises "wronger" answers more heavily because it squares the difference between the prediction and the correct answer.

Lower is better.

Steadyish Improvement

Here's our loss over all 10 epochs:

loss: 0.2828
loss: 0.0387
loss: 0.0270
loss: 0.0200
loss: 0.0147
loss: 0.0111
loss: 0.0094
loss: 0.0083
loss: 0.0066
loss: 0.0054

It's decreasing steadily, slowing as the model improves.

More Epochs?

Is there more value to get out of our training data? Let's crank num_epochs up to 20 and see what happens:

[Epoch 18/20] after 211.4 seconds:
              loss: 0.0005
  within 5 degrees: 80.00%
 within 10 degrees: 95.00%
 within 20 degrees: 99.00%

[Epoch 19/20] after 211.3 seconds:
              loss: 0.0004
  within 5 degrees: 81.50%
 within 10 degrees: 95.00%
 within 20 degrees: 99.00%

[Epoch 20/20] after 207.5 seconds:
              loss: 0.0003
  within 5 degrees: 78.00%
 within 10 degrees: 94.50%
 within 20 degrees: 99.00%

There's improvement but we're at the point of diminishing returns. In fact, our last epoch has slightly worse performance.

Sometimes you might see an occasional epoch where loss doesn't change much or even increases. Accuracy might drop too. If this happens once or twice it isn't something to be concerned about.

If there's a sustained drop in performance, that would indicate a problem.

Training Volume

We generated 1,000 total sample images back in this post:

python generate.py --count 800 --output data-train
python generate.py --count 200 --output data-test

That 80% train and 20% test is a common way to split the dataset.

It's rare you'll have a problem where you can generate as much data as you want. You're more likely to have a limited labelled dataset (samples with a known correct answer).

What would happen if we had less data?