I've been running three AI voice agents that call London restaurants. First Order asks for the must-order dish. The Queue Index asks about walk-in queues. Buggy Smart checks whether venues are buggy-friendly. Between them, they've made over 3,000 calls across 82 neighbourhoods.

Michelin sends inspectors. Harden's surveys diners. We send a phone call.

The technology works. That's the boring part. What's interesting is everything I got wrong about how it would work, and what actually mattered once the calls started landing. Each project has its own story (linked above). This is the infrastructure story that connects all three.

1. Restaurants answer the phone way more than you'd expect

The assumption going in was that most calls would hit voicemail. London restaurants are busy. They're understaffed. Why would they pick up a random call?

They do. On the Queue Index's first Saturday run, 27 out of 56 venues answered. That's a 48% hit rate at lunchtime. Independent restaurants, in particular, pick up. Especially around 1pm when lunch service winds down and someone's standing near the phone anyway.

The chain restaurants with IVR systems and hold music? Forget it. But the places you actually want to hear from are the ones that answer.

2. The interesting data is the reaction, not the answer

I built these agents to collect information. Which dish should I order? How long is the queue? Is there space for a buggy?

The information is fine. But the personality of the response is what makes the product. Lahore Kebab House telling the AI to "just turn up, boss." The Marksman ending the call with "Love you." Jolene's philosophical stance that "if there isn't space, then there is no space."

These aren't data points. They're character sketches. And they tell you more about a restaurant than any review ever could.

The personality of the response tells you more about a restaurant than any review ever could.

3. 3pm is the magic hour

Timing matters more than the script. Call during lunch rush and you get cut off. Call at 5pm and the evening crew hasn't arrived yet. But 3pm, when lunch is winding down, the kitchen is cleaning up, and someone is standing near the phone with nothing urgent to do? That's when you get the best conversations.

One OpenStreetMap query pulled 2,993 London restaurants with phone numbers in 30 seconds. But knowing when to call them was worth more than having the list.

4. One question is the whole product

The Guinndex project proved this first, calling 3,000 Irish pubs to ask the price of a pint. One question. One answer. Done.

Every time I tried to add a second question, the responses got worse. People give you their best take when you only ask for one thing. The constraint isn't a limitation. It's the design decision that makes everything else work.

First Order asks one question: what's the one dish a first-timer has to order? Queue Index asks one question: is there a queue right now? The answer rate, the quality of responses, and the shareability of the results all come from that single-question constraint.

5. The AI detection moment is content

When Brawn's staff member said "Sam, this feels like I'm speaking to an automated service. Is that correct?" my instinct was to treat it as a failure. The cover was blown.

Then I realised it was the most shareable moment in the entire dataset. The detection moment is entertaining, human, and honest. Don't hide it. Feature it.

📞
Best detection line: "This feels like I'm speaking to an automated service. Is that correct?" Brawn, Columbia Road

6. Volume beats precision

A 15% success rate sounds terrible until you have 4,060 venues to call. That's 600 successful conversations from a single batch. You don't need a curated list. You need a big one.

OpenStreetMap's Overpass API turned out to be the best free source for UK restaurant data. One query returned nearly 3,000 London restaurants with phone numbers, names, and coordinates. Every directory scraping approach (Yelp, OpenTable, DMN) was blocked or rate-limited. The open data was both free and better.

7. The server should do one thing

Every project started as a monolith. The Railway server made calls, processed transcripts, classified responses, and deployed the site. Four failure modes in one process. When one broke, everything broke, and working out which part failed was its own project.

Now the server does one thing: make calls. ElevenLabs stores the transcripts. GitHub Actions fetches and classifies them. Netlify serves the site. Each piece can fail independently and get fixed without touching the others.

The architecture that works: Railway makes calls. ElevenLabs stores transcripts. GitHub Actions fetches from the ElevenLabs API. Claude classifies the results. Netlify deploys. A phone notification tells me it's done.

8. Your API provider is your database

The breakthrough that simplified everything: ElevenLabs stores every conversation permanently. Every transcript, every recording, every metadata field. Once I stopped trying to save transcripts on my own servers, the whole architecture collapsed into something elegant.

No database to manage. No filesystem to worry about. No backup strategy to design. The data lives somewhere permanent that someone else maintains. The processing pipeline just fetches from it.

9. Shared resources break in ways you can't see

Two projects sharing one Twilio phone number seemed efficient. They ran at different times, so there was no overlap. Until there was.

The calls overlapped silently. No error messages. No alerts. Just bad data in both projects, and an afternoon spent working out why the transcripts didn't match the venues. Separating each project onto its own infrastructure cost an extra five quid a month and eliminated an entire category of debugging.

The things that share a resource you didn't know they shared will break at the worst possible time.

10. Ship the pipeline, not the product

The calling works. The AI voice is convincing. The responses are genuinely interesting. None of that matters if the data never reaches the website.

What broke, repeatedly, was everything between the call and the published page. Railway's filesystem wiping on redeploy. Silent API failures. Missing notifications. Transcript processing that looked correct but produced empty results.

The plumbing is the product. Get that right and the data takes care of itself.

The agents now run autonomously. Buggy Smart calls three times a day. Queue Index runs on Saturdays. First Order calls on weekend evenings. Each one fetches from ElevenLabs, classifies with Claude, deploys to Netlify, and sends a notification with the results. No human in the loop.

The product isn't the website. The product is the pipeline that populates it.

What's next

4,060 London venues in the First Order database. 82 neighbourhoods covered. Three agents running in parallel, each on its own infrastructure, each doing one thing well.

The next step isn't more technology. It's more calls. The architecture is solid. The pipeline is reliable. Now it's just about pointing it at more restaurants and letting the conversations accumulate. Volume, timing, and one good question. That's the whole playbook.

The three projects

First Order asks London's restaurants their one essential dish. Michelin sends inspectors. We send a phone call. See the deck.

The Queue Index calls 50 restaurants every Saturday and asks how long the wait is. The quotes are the product. See the deck.

Buggy Smart maps which London cafes your pushchair can get into. Made by a London dad who got tired of guessing. See the deck.