Mike Litman
20 Voice Agent Learnings
A TALK BY MIKE LITMAN

20 Things We Learned
Building AI Voice Agents
That Call London's Restaurants

4 projects. 4,000+ venues. 2,500+ calls.
Every lesson came from something going wrong.

Buggy Smart First Order London With Moshi Queue Index OOS Hotline
The numbers

Four autonomous pipelines.
One very long list
of failures.

5 Projects
4,000+ Venues
2,500+ Calls made
20 Lessons

Architecture
& Systems

What happens when you build clever instead of boring.

1
Insight 01 · Architecture & Systems

Two systems deploying to the same place will eventually fight

Railway had its own classify-and-deploy pipeline running alongside ours. It produced 67 venues and overwrote our 231. The fix wasn't "make both better" – it was "pick one and disable the other."

Single source of truth. Always.

2
Insight 02 · Architecture & Systems

Architecture should be boring

Every problem came from cleverness: Railway's own pipeline, shared numbers with schedule gymnastics, two classification systems. Every fix was simplification: one pipeline, one deployer, one number per project.

The boring architecture works at 3am.

3
Insight 03 · Architecture & Systems

"Separate" means separate – check every shared dependency

We said separate phone numbers, but Queue Index was still sharing with Alice. We said separate pipelines, but Railway was still deploying alongside GitHub Actions. Every time we checked "are we really separate?" we found another shared dependency.

Audit everything. Not just the obvious stuff.

4
Insight 04 · Architecture & Systems

Safety checks only protect against the system they're in

Our "don't deploy if count drops" guard worked perfectly – the pipeline produced 253 venues. But Railway deployed separately and overwrote it. Guards protect against self-harm but not external harm.

Design for the full threat model, not just yourself.

5
Insight 05 · Architecture & Systems

Two Claude sessions on the same repo is learning #1 in real time

The other session reverted our workflow 3+ times. It had stale context and genuinely believed it was fixing things. CLAUDE.md files and CANONICAL headers are the defence – but the real fix is one session per repo.

One session. One repo. No exceptions.

Infrastructure
& Operations

The infrastructure always has an opinion. Listen to it.

6
Insight 06 · Infrastructure & Operations

Ephemeral infrastructure has no memory

Railway's filesystem resets on container restart. call-status.json vanished. The server forgot who it had already called. Joy Cafe told us to stop calling. The fix: check the live deployed data before every call batch. Never trust local state on ephemeral infrastructure.

If it can restart, it will restart. Design accordingly.

7
Insight 07 · Infrastructure & Operations

Don't assume infrastructure auto-deploys

We pushed a critical server.js fix and assumed Railway would pick it up. It didn't. Had to manually trigger a redeploy. Nearly caused the same regression twice.

Verify the deploy happened. Then verify it took effect.

8
Insight 08 · Infrastructure & Operations

Caching is what makes the pipeline viable

Processing 2,000+ conversations from scratch would take hours. Because every step caches – fetch, classify, match – re-runs only process new data. This makes 3x daily runs practical.

Cache early, cache often, cache at every step.

9
Insight 09 · Infrastructure & Operations

"Definitely no conflict" needs worst-case analysis, not best-case

We said "no overlap" three times before mapping out that a 5pm batch could run until 7:30pm. Each check found another edge case. The real fix was removing the dependency entirely – separate phone numbers – rather than scheduling around it.

If you're asking "is there a conflict?" there probably is.

10
Insight 10 · Infrastructure & Operations

Check what you already have before buying

Spent an hour staggering schedules before realising there was an unused Twilio number already on the account. Zero cost, zero constraints. Always inventory existing resources first.

The solution you need might already be in the account.

AI & Voice Agent
Behaviour

The agent has its own ideas. Some of them are fine.

11
Insight 11 · AI & Voice Agent Behaviour

Most London restaurants now have AI answering their phones

Out of 242 First Order conversations, ~85% hit IVR, voicemail, or an AI receptionist. The big chains almost never have a human pick up. The indie spots do.

Target indie venues, not chains. They answer.

12
Insight 12 · AI & Voice Agent Behaviour

AI response format drift breaks pipelines silently

Claude started wrapping JSON in markdown fences. The classifier's regex didn't handle it, causing 63–77 errors per run. Silent format changes in AI responses are a real operational risk.

Always parse defensively. AI output formats will drift.

13
Insight 13 · AI & Voice Agent Behaviour

The agent's opening line gets improvised no matter what

The prompt says "quick question" but the agent keeps saying "silly question." The LLM treats the script as a suggestion, not a command. After three prompt iterations we stopped fighting it – it still sounds natural and gets answers.

Guide the outcome, not the exact words. It usually lands better anyway.

14
Insight 14 · AI & Voice Agent Behaviour

AI calling AI is already happening

Gloria London's AI receptionist had a perfectly polite conversation with our AI about pushchair access. Neither knew the other wasn't human. We've arrived at the AI-calls-AI future – and it happened over a question about buggies in Shoreditch.

We're already living in it. Build for that world.

Data
& Product

The messy, human, funny stuff is the actual product.

15
Insight 15 · Data & Product

Venue data quality follows a power law

Hackney has 82 rated venues; Clapham has 2. Most recommendations came from a handful of indie restaurants. Priority retry ranking – call underserved areas first – fills gaps rather than piling up.

Distribution matters as much as volume. Rank by gap, not by size.

16
Insight 16 · Data & Product

The best content comes from the failures

Tarantino trying to order wine from Alice. Lahore Karahi saying "what a call, man." Joy Cafe telling us to stop. The clean "yes we are" answers are data. The messy, human, funny ones are content.

They're what people share. Capture everything.

17
Insight 17 · Data & Product

The quotes are the product

Wait times are structure; quotes are soul. Noble Rot offering "a little perch and a glass of wine." CuppaPug storing the pram "to stop the pugs from weeing on it." These are what people share, not the green/amber/red ratings.

Build infrastructure, but ship the soul.

Process
& Iteration

Better beats bigger. And writing it down is the whole game.

18
Insight 18 · Process & Iteration

Autonomous systems need monitoring, not trust

Everything ran today without intervention. But the data didn't grow until the pipeline was run manually. "Success" in GitHub Actions meant the workflow completed, not that new venues appeared on the map.

The pipeline needs to verify its own output, not just report that it ran.

19
Insight 19 · Process & Iteration

Success rate improves through iteration, not volume

Queue Index went from 23% to 69% through prompt fixes, classifier tuning, and call timing changes – not by calling more venues. Three iterations beat three hundred extra calls.

Fix the quality before scaling the quantity.

20
Insight 20 · Process & Iteration

You learn more from what goes wrong

Every insight on this list came from a failure, not a success. The 231-to-67 regression taught us single source of truth. Joy Cafe taught us ephemeral state. The reverted workflows taught us about session conflicts.

Build a culture of noticing what broke and why. This list is that culture.

Keep building.
Keep breaking.
Keep writing it down.

Mike Litman · mikelitman.me · hello@mikelitman.me

Updated March 2026 · This list grows with every session

Continue from slide ?