Using Our Model

2026-02-14

This is part of a series on ML for generalists, you can find the start here.

We've trained our model, it's ready to use. How do we do that?

Here's everything we need to use our model. I've copy and pasted the model class into this file too. In practice, you'd put this somewhere easily importable.

You can download the complete correct.py here, or check out a nicer version on Github.

# correct.py

import argparse
import math
from pathlib import Path
from PIL import Image
import torch
from torch import nn
from torchvision import transforms


class OrientationModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.flatten = nn.Flatten()
        
        self.fc = nn.Sequential(
            nn.LazyLinear(out_features=64),
            nn.ReLU(),
            nn.Dropout(0.25),
            nn.Linear(in_features=64, out_features=2),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.conv(x)
        x = self.flatten(x)
        x = self.fc(x)
        return x


def predict_angle(model_path: Path, image: Image) -> float:
    to_tensor = transforms.ToTensor()
    image_tensor = to_tensor(image)
    image_tensor = image_tensor.unsqueeze(0)

    model = OrientationModel()
    model.load_state_dict(torch.load(model_path))
    model.eval()

    with torch.no_grad():
        output = model(image_tensor)[0]
        return torch.atan2(output[0], output[1]) * 180 / math.pi



parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=Path, default="orientation-model.pth", help="path to saved model state dict")
parser.add_argument("image_path", type=Path, help="path of PNG image to correct")

if __name__ == '__main__':
    args = parser.parse_args()
    image = Image.open(args.image_path).convert("L")
    
    predicted_angle = predict_angle(args.model, image)

    print(f"{predicted_angle=}")
    image.show()
    corrected_image = image.rotate(predicted_angle)
    corrected_image.show()

Before we step through the code, let's make sure we have a well trained model.

The Last Train

Since we get to generate as much data as we like for this problem, we'll treat ourselves to a large training dataset.

python generate.py --output data-train --count 2000

Then we'll train our model again:

python train.py

If you're using the version with augmentation in the OrientationDataset, you can turn that off:

train_dataset = OrientationDataset(Path("data-train/answersheet.json"), augment=False)

Here's how my last epoch looks. I'll cover running this on a GPU soon, almost 10 minutes per epoch is painful!

[Epoch 10/10] after 510.3 seconds:
              loss: 0.0247
        -- TEST --
  within 5 degrees: 80.00%
 within 10 degrees: 97.00%
 within 20 degrees: 99.00%
        -- TRAIN --
  within 5 degrees: 90.95%
 within 10 degrees: 99.55%
 within 20 degrees: 99.95%

The last line of our train.py script saves all our model weights:

torch.save(model.state_dict(), "orientation-model.pth")

We'll use that orientation-model.pth in our correct.py script.

Making Predictions

Already, some of this code should be starting to look familiar:

def predict_angle(model_path: Path, image: Image) -> float:
    to_tensor = transforms.ToTensor()
    image_tensor = to_tensor(image)
    image_tensor = image_tensor.unsqueeze(0)

    model = OrientationModel()
    model.load_state_dict(torch.load(model_path))
    model.eval()

    with torch.no_grad():
        output = model(image_tensor)[0]
        return torch.atan2(output[0], output[1]) * 180 / math.pi

The first 3 lines convert the PIL Image instance to a tensor. Our model has been trained on batches of images, so unsqueeze(0) adds another dimension to the tensor:

>>> image = Image.open("data-train/sample-0059.png").convert("L")
>>> image_tensor = to_tensor(image)
>>> image_tensor
tensor([[[1., 1., 1.,  ..., 1., 1., 1.],     # 3 dimensions [[[...]]]
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         ...,
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.]]])

>>> image_tensor.unsqueeze(0)
tensor([[[[1., 1., 1.,  ..., 1., 1., 1.],     # 4 dimensions [[[[...]]]]
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]]])

This makes sure our input is in the shape our model expects.

When we convert the image to a tensor, it has dimensions: [colour, height, width]

When we unsqueeze(0) it, it has [batch, colour, height, width]

Why is it called unsqueeze? There's a squeeze function that removes dimensions of size 1. unsqueeze is the inverse, it adds a dimension of size 1.

The naming comes from the idea that a dimension of size 1 can be "squeezed out", because it doesn't carry any extra information. Adding one back is "unsqueezing".

Loading Our Model

Nothing surprising here: create an instance of our model class and load the trained weights from the file on disk.

model = OrientationModel()
model.load_state_dict(torch.load(model_path))
model.eval()

Once our model is loaded, we'll put it in the eval state, so components like nn.Dropout() don't mess with our predictions.

In a real system, we'd probably keep an instance of our loaded model around, rather than load it for each call.

Getting An Answer

You might remember torch.no_grad() from the evaluation part of our training. PyTorch tracks changes to tensors so we can update our model, when we're not training we can turn that tracking off.

with torch.no_grad():
    output = model(image_tensor)[0]
    return torch.atan2(output[0], output[1]) * 180 / math.pi

We called our model with a "batch" of 1 image, so we take the first result it returns. These will be the two outputs of our final layer:

nn.Linear(in_features=64, out_features=2)

and a little trigonometry turns the sine and cosine back into an angle.

Run It

Here's what I get when I run:

% python correct.py data-test/sample-0059.png
predicted_angle=tensor(-95.0066)

That's It

Congratulations! You've built a model to correctly orient images. Go you!

There are just two more details to cover for this tutorial and we're done.

Next is backpropagation, the mechanism that trains our models so we don't have to configure convolution kernels ourselves.

Finally, how do we make these training epochs much shorter by moving work to GPUs.