Live in Belfast, Northern Ireland. Two cats.

Professional software engineer. Recovering people manager. Hobbyist data scientist.

You can find my ML for Generalists tutorial here.

Predicting BER with XGBoost

Continuing from my last post, can I predict building energy ratings from features a homeowner would already know?

Linear regression with 17 features (year of construction, number of storeys, etc.) got me to 0.45 of variance explained (R²).

With just a few lines of code, I got that to 0.78.

Read more...

Brrr: Predicting Building Energy Ratings

If you've ever rented or bought a home in Ireland, you'll be familiar with Building Energy Ratings (BER). They tell you how cold you'll be in winter, graded from A1 (toasty) through to G (cold). "BER Exempt" means you're in for a rough time.

To get your BER, an assessor takes dozens of measurements, pokes at every vent, crevice and hole, then produces a number: the cost to heat the building in kWh/m²/year. That number gets bucketed into a category from A1 to G:

RatingkWh/m²/year
A1≤ 25
A225 - 50
A350 - 75
B175 - 100
......
E2340 - 380
F380 - 450
G> 450

The Sustainable Energy Authority of Ireland (SEAI) is responsible for BER, and they handily publish an anonymised dataset.

Can I, a proven idiot, use machine learning to predict a home's BER using only information a homeowner is likely to know? No crevice exploration required?

This is a short series of posts to figure that out. What I tried, what worked, what didn't.

Read more...

Speed 2 but Gay

I want to take back control of the recommendation algorithms I'm exposed to.

Streaming platforms don't know what mood I'm in. I rarely know myself to be honest, it's usually just a vague sense of what I don't want.

Sometimes I want something challenging, the kind of film Cillian Murphy has gushed about in an interview, where I'm left thinking about the central question of the narrative for days. Sometimes I want helicopters and explosions and exploding helicopters.

I usually want more queer media. Don't get me wrong, some of my best friends are straight, but more and more I'm just not in the mood for a hetero B-plot.

Can I build something that answers: Films like Speed 2, but gay?

Read more...

I paid €300 to make an LLM hate dogs

Direct Preference Optimisation (DPO) lets you shape an LLM's behaviour.

DPO is used for tasks like:

  • Refusals and safety mechanisms: don't say harmful stuff, don't tell people how to do bad things
  • Tone and style alignment: be concise, or talk in a certain way
  • Decontamination: make a model "unlearn" specific content, like copyright material or personal information
  • Format compliance: prefer valid JSON over word salad

The big AI corps use RLHF (Reinforcement Learning from Human Feedback) and a ton of compute to do this.

How far can I get on a €300 second-hand RTX 3060 with 12GB of VRAM and some questions about dogs?

Qwen3 4B refusing to answer questions about dogs

Read more...

Running on GPUs

This is part of a series on ML for generalists, you can find the start here.

On my machine, a dataset of 2,000 training samples taking about 10 minutes (for a single epoch) on my CPU. That's not terrible for one run, but I need to iterate and experiment. I might change a hyperparameter (like learning rate) or tweak the architecture to add more layers, or bigger layers, or... you get the idea. Waiting around makes that painful.

Ten minutes per epoch means I can't realistically try more than a handful of changes in a session. Context switching is rough. Kick off a training run, go do something else, come back, check the results, try remember what I was testing, hopefully I scribbled a note somewhere. Repeat. I can only change one variable at a time if I want to learn what's happening, so my progress crawls along.

Enter GPUs.

Read more...

Backpropagation

This is part of a series on ML for generalists, you can find the start here.

Backprop answers one question for every weight in your network: "how would the loss change if I tweaked this weight slightly?"

The answer is called the gradient. Once you have it, training is simple: nudge each weight in the direction that reduces the loss.

That's it! The rest is just working out the maths efficiently.

Read more...

Data Augmentation

This is part of a series on ML for generalists, you can find the start here.

We know the data is the important part. It can take a lot of work to compile an accurately labelled dataset. How do we make sure we get the most out of what we have?

Read more...

Overfitting

This is part of a series on ML for generalists, you can find the start here.

Our loss dropped, but our test accuracy didn't improve. What's going on?

Our model might be learning the right answers to give for our training images but not how to predict the right answers using general features of any image. This is overfitting.

There are some approaches we can take:

  • More training data: we saw this wasn't an issue when we had 800 training samples instead of 200
  • Make our model dumber by removing layers or making those layers smaller, so it has less capacity to learn our dataset
  • Make our model forget some data between each layer during training, so it's forced to find general patterns

First, let's confirm our suspicion and see if overfitting is the problem.

Read more...

Less Training Data

This is part of a series on ML for generalists, you can find the start here.

In practice, we rarely get to generate as much labelled data as we want. Let's look at what happens when we have 200 training samples instead of 800:

python generate.py --count 200 --output data-train

Read more...