The Generalist's Guide to Machine Learning

2026-02-01

I'm a generalist software engineer. I think I'm a generalist in most aspects of my life. I've never wanted to be pigeonholed as a front-end or a backend dev. Same for languages: I'm not a Python person or a Kotlin kid. I can't solve complete problems without knowing a little about a lot.

I always considered machine learning an exception. A domain too specialised to understand, with lots of maths and academic knowledge required. The closest I came was translating Jupyter Notebooks into applications that could run in production.

That's changed. Even when the next AI winter comes (and it'll be a cold one), people will still expect that little sprinkle of ML magic in their applications.

This is the full series. Each one of these is short, less than 20 minutes of your time.

When I was sixteen years old, I was given an old 14.4k dial-up modem and access to the Internet (at off-peak times of course). It felt overwhelming, I have such a clear memory of the thought: I'll never understand this, I've come to this too late.

One day, I stumbled across Carolyn P. Meinel's The Happy Hacker: A Guide to (Mostly) Harmless Computer Hacking and it's not an exaggeration to say that set me on a path for the rest of my life.

She made learning about DNS fun! image

With the explosion of LLMs over the last few years, I had that same feeling again: I've come to this too late. But that's never true, in fact it's easier than ever. This is my humble contribution in the spirit of The Happy Hacker (except Carolyn actually knew what she was talking about).

Learning about ML has made my computer fun again. You can have a long and profitable career writing what amounts to PostgreSQL wrapped in a web app. ML feels like a magic trick and I'm seeing behind the curtain.

When you approach it as a software engineer, machine learning looks confusing. The libraries are strange even when the language is familiar. pandas is the most popular way to manipulate datasets in the data science world, but the operator overloading magic it relies on still makes my head spin.

My most significant realisation:

As a software engineer I write Python for machines to execute and humans to understand.

When a data scientist writes Python, they're writing executable maths.

With that in mind, the conventions, syntax, and odd variable names all become a bit less incomprehensible, a little less frustrating. Smart, kind people have packaged up the maths for you.

I'm not a data scientist. I'm not an expert on anything. There may be typos, mistakes or inaccuracies in here but I've asked some people I trust to read through and we've hopefully caught the most obvious problems.

This subject is almost as new to me as it is to you, so the confusing aspects are still fresh in my mind. Many tutorials I followed start by pulling a dataset from HuggingFace, which is about the quickest way to get started, but I had no sense for the shape of my data or what my model was really doing.

If you spot a mistake or have some feedback, you can email me with my first name at this domain.

My aim is to trace a path to a useful, tangible outcome. Filling in some of the details as we go, coming back to deeper topics once we have a general sense of what's going on.

When you wrote your first function in the first programming language you learned, you probably weren't aware of the differences between passing by reference and passing by value. You got to grips with the idea of a callable chunk of code first, then filled in the details. Same idea here.

🤓
Well, actually... Some of my best friends know mathematicians
There are 🤓 pedantry notes in some parts of the tutorial. You can safely ignore these, they're part of the agreement I've made with the mathematicians.

First, we need to talk about how we're doing this.

The data is the important bit. It's the hardest part, the slowest part, the everything part. Get the data right and the ML techniques are very understandable.

Finding, compiling and cleaning a dataset for your model is the most time consuming aspect in my experience. So we're going to start with a problem where we can cheaply generate as much good data as we want.

Our model will be a convolutional neural network (CNN) to horizontally align rotated images.

Given a rotated image containing text:

image

Predict the required rotation degrees to correct it:

image

We'll use PyTorch and your CPU (no GPU required, but we'll cover them later).

This is a nice problem to start with because:

2-dimensional data can represent more than images, the model we design might be good at other problems too.

I made a nearly identical model to the one in this series that listens for a wake word (like Alexa or Hey Siri) through a USB microphone.

Stick with me, you've made it this far.

Read one post a day? Use that settling in time when you're just back at your desk after lunch. By the end of the week you'll know what a CNN is, how it works and what their inventor really thinks of Mark Zuckerberg.

The next post covers how to set up your repo.