Programmatic SEO at scale, without slop

The brief

The client sells an API. Their best customers find them by searching for very specific use cases — "how to extract line items from a Polish VAT invoice", "webhook signature verification in Elixir", that kind of thing. There are tens of thousands of those queries. Their marketing team is three people. The maths didn't work.

They'd already tried the obvious thing: hire a content agency to crank out pages. Three months in, the pages were generic, the rankings were nowhere, and Google had quietly started ignoring half of them. The agency suggested doubling down. The CMO called us instead.

What was actually broken

Two problems, tangled together. The first was scale: any human-written approach to long-tail SEO at this scope was going to take years. The second was quality: every AI-written approach they'd seen — and they'd seen a lot of them — produced pages that all sounded the same, said almost nothing, and had no relationship to the actual product.

The interesting part was that they had the missing ingredient sitting in their database. Live API logs. Real customer code samples (anonymised). SDK metadata. Error catalogues. A few thousand high-quality support tickets, indexed and tagged. The agency had been writing about the product. The pages they actually needed would be written from the product.

The shape of the system

We sketched it on a Tuesday and had a working spike by Friday. The shape held all the way to production:

A keyword warehouse in Supabase — every long-tail term they wanted to target, tagged by topic, language, complexity, intent. Seeded from a paid SEO tool, then deduped and clustered against their existing content.
A context layer — for each keyword, the pipeline pulls relevant code samples, error references, related endpoints and ticket excerpts from their internal systems. This is the part that makes the page non-generic. Most teams skip this step. It's where all the value lives.
A strict drafting template — Claude isn't told to "write an article about X". It's given a context bundle, a precise outline (problem → minimal example → gotchas → reference table → related), and a style guide that bans about thirty common AI tells.
Quality gates — every draft runs through automated checks before a human ever sees it: factual claims grounded in the context bundle, code samples actually executed against a sandbox, uniqueness against the existing corpus, reading-grade range, structured-data sanity.
A review queue — drafts that pass gates land in a Notion-shaped review queue. A human approves, edits, or kills. Approved pages are published to Webflow with full JSON-LD and internal links generated from the topic graph.

"I expected the AI part to be the hard part. Turned out the schema work, the context layer and the kill criteria were the hard parts. The model just did what it was told." — CMO, post-launch retro

What we actually built

Two things, sitting next to each other:

The pipeline. n8n orchestrating about a dozen steps — keyword expansion, context retrieval, draft, eval, publish — with a small Python service in the middle for the heavier checks. Claude on the drafting and the eval, separated into two runs with different system prompts so the model isn't grading its own homework.

The dashboards. A Metabase view on the same Supabase warehouse, so the team can see what's queued, what's published, what's ranking, and which patterns are failing. Search Console feeds back in, so by week four the keyword warehouse was being re-prioritised against pages that were actually moving.

The whole stack costs the client about £140 a month to run, plus their LLM bill. The bill scales with the catalogue, not with the team.

The outcome

4,800

Pages live within 6 weeks of kickoff.

+312%

Organic traffic, weeks 1–24 vs. prior baseline.

11%

Of pages drove 80% of the lift — and the pipeline learned from them.

Six weeks in, search traffic was on a clear new curve. By month six it was three times the prior baseline, with a long tail that kept widening. About one in nine pages did most of the work — which is roughly what you'd expect from the long tail — but because the cost per page was low, the rest weren't a drag.

More importantly, the marketing team stopped being the bottleneck on content. Their job shifted from writing pages to shaping the system — adjusting templates, killing patterns that didn't earn their keep, feeding new keyword clusters in. It looks more like product management than blogging.

What we learned

Quality gates do most of the work.

If you trust the model to write and then trust the model to grade, you get exactly the slop everyone complains about. The eval pass has to be cheaper, narrower, and ideally non-AI: code that runs, schemas that validate, facts that resolve. The LLM is one input. The gates are the product.

Don't write about your product. Write from it.

The single biggest quality lever was the context bundle. The same model with no context produces a page that could be about anyone's product. The same model with a tightly assembled context bundle — code, errors, tickets, telemetry — produces a page that could only be about this one.

Treat the corpus like software.

Every page has a version, a generation hash, a template version, and an eval score. When the template improves we can re-run the affected slice. When a rule changes — say, Google starts penalising some pattern — we can find and fix every affected page in an afternoon. Without that, programmatic SEO is an unmaintainable junkyard within six months.

The one thing to take away

Programmatic SEO is not a writing problem. It's a data-pipeline-with-good-taste problem. If you have a product worth searching for, you almost certainly already have the raw material to feed it. The question is whether you're willing to build the bit in the middle.

Programmatic SEO at scale, without slop.