4 projects. 4,000+ venues. 2,500+ calls.
Every lesson came from something going wrong.
What happens when you build clever instead of boring.
Railway had its own classify-and-deploy pipeline running alongside ours. It produced 67 venues and overwrote our 231. The fix wasn't "make both better" – it was "pick one and disable the other."
Single source of truth. Always.
Every problem came from cleverness: Railway's own pipeline, shared numbers with schedule gymnastics, two classification systems. Every fix was simplification: one pipeline, one deployer, one number per project.
The boring architecture works at 3am.
We said separate phone numbers, but Queue Index was still sharing with Alice. We said separate pipelines, but Railway was still deploying alongside GitHub Actions. Every time we checked "are we really separate?" we found another shared dependency.
Audit everything. Not just the obvious stuff.
Our "don't deploy if count drops" guard worked perfectly – the pipeline produced 253 venues. But Railway deployed separately and overwrote it. Guards protect against self-harm but not external harm.
Design for the full threat model, not just yourself.
The other session reverted our workflow 3+ times. It had stale context and genuinely believed it was fixing things. CLAUDE.md files and CANONICAL headers are the defence – but the real fix is one session per repo.
One session. One repo. No exceptions.
The infrastructure always has an opinion. Listen to it.
Railway's filesystem resets on container restart. call-status.json vanished. The server forgot who it had already called. Joy Cafe told us to stop calling. The fix: check the live deployed data before every call batch. Never trust local state on ephemeral infrastructure.
If it can restart, it will restart. Design accordingly.
We pushed a critical server.js fix and assumed Railway would pick it up. It didn't. Had to manually trigger a redeploy. Nearly caused the same regression twice.
Verify the deploy happened. Then verify it took effect.
Processing 2,000+ conversations from scratch would take hours. Because every step caches – fetch, classify, match – re-runs only process new data. This makes 3x daily runs practical.
Cache early, cache often, cache at every step.
We said "no overlap" three times before mapping out that a 5pm batch could run until 7:30pm. Each check found another edge case. The real fix was removing the dependency entirely – separate phone numbers – rather than scheduling around it.
If you're asking "is there a conflict?" there probably is.
Spent an hour staggering schedules before realising there was an unused Twilio number already on the account. Zero cost, zero constraints. Always inventory existing resources first.
The solution you need might already be in the account.
The agent has its own ideas. Some of them are fine.
Out of 242 First Order conversations, ~85% hit IVR, voicemail, or an AI receptionist. The big chains almost never have a human pick up. The indie spots do.
Target indie venues, not chains. They answer.
Claude started wrapping JSON in markdown fences. The classifier's regex didn't handle it, causing 63–77 errors per run. Silent format changes in AI responses are a real operational risk.
Always parse defensively. AI output formats will drift.
The prompt says "quick question" but the agent keeps saying "silly question." The LLM treats the script as a suggestion, not a command. After three prompt iterations we stopped fighting it – it still sounds natural and gets answers.
Guide the outcome, not the exact words. It usually lands better anyway.
Gloria London's AI receptionist had a perfectly polite conversation with our AI about pushchair access. Neither knew the other wasn't human. We've arrived at the AI-calls-AI future – and it happened over a question about buggies in Shoreditch.
We're already living in it. Build for that world.
The messy, human, funny stuff is the actual product.
Hackney has 82 rated venues; Clapham has 2. Most recommendations came from a handful of indie restaurants. Priority retry ranking – call underserved areas first – fills gaps rather than piling up.
Distribution matters as much as volume. Rank by gap, not by size.
Tarantino trying to order wine from Alice. Lahore Karahi saying "what a call, man." Joy Cafe telling us to stop. The clean "yes we are" answers are data. The messy, human, funny ones are content.
They're what people share. Capture everything.
Wait times are structure; quotes are soul. Noble Rot offering "a little perch and a glass of wine." CuppaPug storing the pram "to stop the pugs from weeing on it." These are what people share, not the green/amber/red ratings.
Build infrastructure, but ship the soul.
Better beats bigger. And writing it down is the whole game.
Everything ran today without intervention. But the data didn't grow until the pipeline was run manually. "Success" in GitHub Actions meant the workflow completed, not that new venues appeared on the map.
The pipeline needs to verify its own output, not just report that it ran.
Queue Index went from 23% to 69% through prompt fixes, classifier tuning, and call timing changes – not by calling more venues. Three iterations beat three hundred extra calls.
Fix the quality before scaling the quantity.
Every insight on this list came from a failure, not a success. The 231-to-67 regression taught us single source of truth. Joy Cafe taught us ephemeral state. The reverted workflows taught us about session conflicts.
Build a culture of noticing what broke and why. This list is that culture.
Mike Litman · mikelitman.me · hello@mikelitman.me
Updated March 2026 · This list grows with every session