Speed 2 but Gay

2026-02-28

I want to take back control of the recommendation algorithms I'm exposed to.

Streaming platforms don't know what mood I'm in. I rarely know myself to be honest, it's usually just a vague sense of what I don't want.

Sometimes I want something challenging, the kind of film Cillian Murphy has gushed about in an interview, where I'm left thinking about the central question of the narrative for days. Sometimes I want helicopters and explosions and exploding helicopters.

I usually want more queer media. Don't get me wrong, some of my best friends are straight, but more and more I'm just not in the mood for a hetero B-plot.

Can I build something that answers: Films like Speed 2, but gay?

Plan

As someone who's never built a recommendation engine before, here's my plan:

Download a Wikipedia dump and extract all articles about films
Extract the Plot section and use a local LLM to summarise the plot in a sentence or two
Get the LLM to extract some features, e.g. comedy, drama, romance, thriller, lgbt, western, sci-fi
Use a local embedding model to embed the summarised plot so I can find related films from a seed (Speed?) film
Filter the results using the extracted features, e.g. show nearest films that have the lgbt feature and hide all sports biopics

I'm doing this all locally, on a second-hand consumer-level video card.

Pick a Model

For the summarisation step, I'm going with quantised 14B models. I've tried them before and found them quite capable at tasks like this.

I decided to try out (all at 4-bit quantisation):

Qwen3
DeepSeek R1
Phi4
Ministral 3

I used 10 film plots and evaluated the output by hand/vibes. This approach felt easier than using an LLM-as-a-judge with something like deepeval because I wasn't sure of the criteria I was after, but 10 films was also about the limit I could fit in my head.

Here's the prompt I used:

You are a film analyst. When given a film title, write a single sentence
that captures the genre, setting, plot, themes, tone, and audience appeal
of the film. Be factual and concise. Do not invent details.

Rules:
* Respond with one sentence only. No preamble, no follow-up.
* Cover: what kind of film it is, where and when it is set, what it is
  about, its central themes, and its overall tone.
* Do not begin with "This film..." or "The movie...".
* If you are unsure about a detail, omit it rather than guess.

Example output for Brokeback Mountain:
A slow-burning, melancholic LGBT romantic drama set across rural Wyoming
in the 1960s-80s, following two cowboys whose secret love affair unfolds
over decades against a backdrop of repression, identity, and the personal
cost of living inauthentically.

Phi4 and DeepSeek missed important details and produced quite generic output.

Ministral 3 and Qwen 3 were pretty close, but Qwen seemed to capture just a little extra detail. Very much vibes.

No Homo

Neither Qwen or Ministral would mention LGBT themes though, even with the Brokeback Mountain example in the prompt.

I thought it might be an alignment problem, some "don't say gay" trained in. When I looked at the actual plot outlines, quite a few didn't mention anything explicitly queer. It was too hard for a 14B model to infer from "both these characters have guys' names."

I fixed this by including all the categories the Wikipedia article was tagged with and adding an instruction to the prompt:

You MUST use the categories the film is listed in to create your summary.

This worked pretty well, despite the noise in the categories, like which studio it had come from, shooting locations, etc.

Picking an embedding model

Similar for the embedding model, what to choose and something I can run myself.

I used Qwen3 to generate plot summaries of the same 10 films, embedded them using three different models and compared the resulting clustermaps:

all-minilm:33m

embeddinggemma:300m

qwen3-embedding:8b

qwen3-embedding and embeddinggemma were pretty close, I went with qwen3-embedding because it clustered Pride a little closer to Brassed Off and Trainspotting, that British film distinction seemed useful.

Ordering

I had my models selected, some minimal code to validate everything worked. Time to start summarising and feature-extracting Wikipedia articles!

Except it's taking about 8 seconds per article and I have over 45,000 films (I excluded pre-1960 films). That's about 100 hours, or 4 days of GPU fan spinning. And I have a single consumer GPU, so this is very much a single-threaded process.

Not all films are equal though. I decided to tackle them in an order, with the largest wiki articles first: ORDER BY length(wikitext) DESC. My theory being that longer overall articles are probably more notable films: "Critical Reception", "Awards" sections and so on.

Running

I had this running on and off for quite a while. It's late February and the solar panels on our house are already starting to collect enough energy that the house battery is charged and we're exporting to the grid. I find off-grid ML projects deeply satisfying.

Here's the prompt I used, there's room for improvement but answers seemed good:

/nothink
You are a film analyst. When given a film's plot and Wikipedia categories, you provide:
1. A single sentence summary capturing genre, setting, plot, themes, tone, and audience appeal
2. A structured classification of the film's attributes

You MUST use the categories the film is listed in to inform both outputs.
You MUST respond with valid JSON only. No preamble, no follow-up.

Summary rules:
* One sentence only
* Cover: what kind of film it is, where and when it is set, what it is about, its central themes, and its overall tone
* Do not begin with "This film..." or "The movie..."
* If you are unsure about a detail, omit it rather than guess

Example summary for Brokeback Mountain:
A slow-burning, melancholic LGBT romantic drama set across rural Wyoming in the 1960s-80s, following two cowboys whose secret love affair unfolds over decades against a backdrop of repression, identity, and the personal cost of living inauthentically.

Respond in this exact format:
{
    "summary": "<one sentence summary>",
    "action": <true if action film>,
    "animation": <true if animated>,
    "adventure": <true if adventure film>,
    "crime": <true if crime film>,
    "comedy": <true if comedy>,
    "documentary": <true if documentary>,
    "drama": <true if drama>,
    "fantasy": <true if fantasy>,
    "horror": <true if horror>,
    "lgbt": <true if contains LGBT themes>,
    "romance": <true if romance>,
    "period": <true if set before 1970>,
    "modern": <true if set 1970 or later>,
    "futuristic": <true if set in the future or sci-fi setting>,
    "western": <true if western genre>,
    "thriller": <true if a thriller>,
    "scifi": <true if scifi genre>,
    "musical": <true if a musical>,
    "sports": <true if primarily about sports>,
    "war": <true if primarily about war>,
    "biopic": <true if primarily about a person>
}

The /nothink turned off the reasoning mode, which didn't make much difference to the output on this task, but meant quite a few less tokens my GPU had to crunch through.

Biggest (pleasant) surprise for me was this tiny, heavily quantised model, produced valid JSON for more than 99% of responses. When it failed it was typically an unterminated string in one of the response keys.

That's better than I had with GPT-3.5 and GPT-4 just a few years ago. The rate of progress is incredible.

I stopped after ~35,000 films. Can't tell you how many hours or watts burned, but the card was probably pulling 150W for most of that.

Better on the web

It worked nicely with a little Python CLI, but this would be much more fun as a web toy. Unfortunately, the SQLite database I'm storing this in is 1.5GB. Not too practical for the web.

I thought I'd use Scikit's PCA to lossily compress my expensive 4096 dimension embeddings to something a bit more reasonable. Here's what PCA fitted to my film embeddings looked like across some candidate dimension counts:

or in words:

 dimensions: variance ratio
    64 dims: 0.624
   128 dims: 0.740
   256 dims: 0.837
   320 dims: 0.863
   512 dims: 0.910
   768 dims: 0.942
  1024 dims: 0.960
  1152 dims: 0.966

I went with 512 dimensions. It is slightly noticeably worse than the full 4096, but the important results weren't lost.

The difference between 512 dim's 90% and the 83% of 256 dimensions was actually very noticeable in practice.

I rolled my own 8-bit quantisation too, since I'm storing this in JSON and fewer digits means fewer bytes.

512 dimensions (at 8 bits) gives a 64MB JSON file (including title, description and features). 256 dimensions gave a 40MB file. I figured the difference was worth it, and the 64MB gzips down to 23MB anyway. About the size of a small React app.

So what is Speed 2 + LGBT?

This works! I've found some fun films I'd have otherwise missed.

My own recommendation from the Speed 2 + LGBT answers is the campy 1970s The Last of Sheila featuring a young Ian McShane rocking the lovejoy out of a white Lacoste cardigan:

You can give it a try (app not cardigan) here.