Make a Model
2026-02-05
This is part of a series on ML for generalists, you can find the start here.
We're at the fun bit, where we decide on our model architecture and build out the layers.
The first surprise is just how little code this takes.
# train.py
import torch
import torch.nn as nn
class OrientationModel(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.flatten = nn.Flatten()
self.fc = nn.Sequential(
nn.LazyLinear(out_features=64),
nn.ReLU(),
nn.Linear(in_features=64, out_features=2),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.conv(x)
x = self.flatten(x)
x = self.fc(x)
return x
Stay Calm
It's ok, the code looks strange but it will make sense. Let's cover the important bits first.
There are 5 convolutional layers:
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
and 2 fully connected layers:
nn.LazyLinear(out_features=64),
nn.Linear(in_features=64, out_features=2),
The short version:
The 2-dimensional convolutional layers identify patterns in 2D space by looking at small squares of the image (kernel=3 gives us a 3x3 pixel square). It's a bit like an artist squinting at a landscape to block out rough shapes and features.
The fully connected layers take all those detected patterns and combine them to predict our final answer: the sine and cosine of the angle (out_features=2).
Terminology
Even though our model is made up of convolution layers and fully connected layers, the convention is to call our entire model a Convolutional Neural Network (CNN).
In ML-speak, these are the hidden layers of our model, in that these layers live inside our model (like encapsulation).
If you look at your favourite model on HuggingFace, you'll see the number of hidden layers it has buried in the config.json:
Is This a Good Model?
That depends on what "good" means. The typical tradeoff in machine learning is speed vs. accuracy, and where you land on that spectrum depends on the problem you're solving. A model that needs to respond in milliseconds will look different to one that can mull on a batch job overnight.
For now, I'm not worried about either. This is just our starting point. Over the next few posts I'll break down each component in the model individually, then we'll wire them together and start training.
Once we have end-to-end training and evaluation working, that's when the real experimentation begins. We can start changing the shape of the model and see if those changes help.
The Longer Version
We're going to take a quick look at each of the concepts in our model:
- Convolutional Layers
- Fully Connected Layers
- ReLU Activation
We're building our model's neural network with convolutional layers and fully connected layers.
Then we'll get to the training bit, I promise.
First up: convolutional layers.