There was no map for this.
Google will tell you the coffee is excellent. TripAdvisor will tell you the service was friendly. Neither will tell you whether there is a step at the door, a narrow corridor, or a host who looks at you as though you have brought a small industrial vehicle through the entrance. Which, with a pushchair, you have.
I have a two-year-old and a pushchair and I live in London. I started Buggy Smart because I was tired of guessing. Not a tech demo, not a clever prompt: a real dataset for a real thing I needed. The kind of thing you build when you are the user and you have run out of patience.
Buggy Smart just crossed 1,000 verified venues. 1,140 London cafes and restaurants, each one rung by a voice agent, each one answered, each one classified against a single question: is there space for a buggy? The dataset that didn't exist now does.
One question, ten thousand times
The way it works is simple to describe and was not simple to build. Poppy, the voice agent, rings a London cafe or restaurant and asks one question: is there space for a buggy? She listens to the answer, asks a clarifying follow-up if she needs one, and the call is logged, transcribed and classified.
She has now made more than ten thousand of these calls. Eight of them ended with the person on the other end working out she was an AI. A 99.9% pass rate, which is a separate essay, but worth noting before we get to the part where almost everything went wrong.
Three silent failures that nearly killed it
If you only looked at the front end, Buggy Smart had been running smoothly for months. The map grew. The calls went out. The dashboard ticked upward. Everything looked fine. Everything was not fine.
The first failure was the classifier. Until 18 April, 86% of calls were being bucketed as "unclear". Poppy was getting clean, usable answers on the phone, and a downstream step was throwing almost all of them into a pile labelled "we do not know". I only caught it because I started spot-checking transcripts and realised the actual answers were in there; they just were not being read properly. One fix, and thousands of calls suddenly had verdicts.
The second failure was structural. The pipelines were dying at exactly 60 minutes, every single day. GitHub Actions has a default job timeout, and my jobs were hitting it. The runs were ending mid-batch with no loud error, just a polite little stop. The map looked like it was growing. It was growing slowly because half of every day's work was being guillotined at the hour mark.
The third failure was the most expensive per unit of regret. When Poppy hit a voicemail, she stayed on the line for 90 seconds: listening to the beep, waiting, eventually giving up. Ninety seconds of paid call time, per voicemail, across thousands of calls. Turning on Answering Machine Detection cut that to five to seven seconds. Same data, a fraction of the cost.
1,140 venues, and the shape of the answer
1,140 venues on the map. 522 green, meaning the venue said yes, come in. 526 amber, meaning possible with care: there is a step, a tight corner, a downstairs room, but you are not being turned away. 92 red, meaning no. Thirty London boroughs covered.
The headline for me is not the green number. It is the amber one.
London is not hostile, it is nuanced
Amber is almost level with green. Red is 8%. That is a very specific picture of a city, and it is not the picture I expected when I started.
London is not hostile to buggies. Only eight in every hundred venues say no outright. The real story is in the middle: nearly half the city is some version of "yes, if". Yes, if you can fold it. Yes, if you sit by the window. Yes, if you do not need the loo. Yes, if you come before noon.
The accessibility conversation in this city is a conversation about nuance, and a green-or-red map would have flattened it into something less useful and less true. A binary answer would have told me London was half closed. The honest answer is that London is mostly open, on terms worth knowing in advance. That is only visible because the dataset is large enough to have a middle.
The self-improving dataset
The technically interesting part of Buggy Smart is not the calling. Voice agents that can hold a short conversation are a solved problem now. The interesting part is what happens after the call.
Every evening, a second pipeline runs. It takes past transcripts that the old classifier marked as "unclear" and re-evaluates them with updated rules. If the transcript contains a usable answer, the venue gets reclassified and pushed to the live map. That is how 184 venues appeared overnight without Poppy making a single new call.
The dataset gets smarter without new calls. Old work gets a second reading. Mistakes get corrected retroactively, at scale, overnight, while I sleep. That is the bit I am most proud of, because it means every improvement to the classifier makes the entire back catalogue better, not just next week's work. It is the difference between a dataset and a living one.
From map to accreditation
The map is the proof. The business is what comes next.
There is a full deck that walks through the architecture, the numbers, and what I learned building this. The live map is at buggysmart.app. If you are a parent in London, go and use it. If you are a venue on it and the classification is wrong, tell me and I will fix it.
The next phase is accreditation: letting buggy-friendly venues claim their listing, display a verified badge, and use Buggy Smart as a proof point for the customers they actually want. The map earns the right to sell that badge, because the badge only means something if the data behind it is real. 1,140 calls in, it is.
This data should have existed a long time ago. I made it for myself. Now it does.