Fifteen-plus years in brand and culture. I have never written production code by hand. Everything in this product was built by AI coding agents working in long sessions, often several in parallel, often overnight. It runs live, with real users and a real business partner.
Beat the Gaffer is a World Cup prediction game with an AI pundit whose picks are public and permanent. For weeks the scoring engine had never scored a real match. The leaderboard had never moved. Opening night was the moment every promise got called in at once.
Mexico 2-0 South Africa lands in the database. One scheduled function fires and settles 141 predictions in a single run. Nothing breaks, nothing double-counts, nobody gets emailed by accident. The Gaffer calls the exact score and banks five points.
Deep in the scoring engine's full-time write sat a guard. Perfectly correct for as long as scores only appeared at full time. The evening live in-play scores shipped, a score now existed at half time. That one guard would have silently blocked every final result of the tournament.
The guard was reworked the same night to check match status, not score presence, and pinned with tests that fail if the behaviour changes by a byte. Its first ever production execution was the World Cup opener. It worked first time.
Not luck. Not skill. Conventions.
Reviewed by nobody. Shipped at speed. Written by agents working in parallel, on a live product with real users and a real business partner. Every instinct from traditional software says this should end badly.
"I don't trust the code; I trust the rules it has to pass through before it reaches me."
Guidance that lives in a chat box and dies when the session ends.
Rules that survive between sessions and apply to whichever agent shows up next. Engineering teams have always had conventions. The new part: the rules replace the review, and a non-engineer can write them.
When live scores got the green light, an agent swept every file that reads match results for places assuming "score exists" meant "match finished". Forty-six were already safe: a convention written weeks earlier, applied by every agent in every session since.
The instruction set is not a manifesto. It is a changelog of everything that has ever gone wrong.
The question has stopped being "can the model do the work?" It can. The question is what surrounds the model.
The gates, the defaults, the evidence rules: that layer is not technical. It is brand, trust and taste decisions, encoded. It is how a strategist runs an engineering organisation staffed by agents: one human, doing the judgement.
The least glamorous discipline matters most: not claiming things until proven. An honest record lets a fresh agent, or a fresh human, pick up the work cold. Most organisations cannot do this with people. With machines, you can build it in.
Next morning, South Korea beat Czechia 2-1. The upgraded feed tracked the score live mid-match, the database followed it in real time, and the full set of predictions was scored about ten minutes after the final whistle. Not a one-off. A system.
Opening night proved the conventions hold. Now compounding gets its turn.
103 matches to go. A knockout rule waiting for its first penalty shootout. An AI pundit who will not shut up. The machine meets reality every day for five weeks.
mikelitman.me · hello@mikelitman.me