Seven days for ElevenLabs

The brief

The first interview was Monday 27 April at 11am. Angharad Knill, Head of Communications at ElevenLabs, hiring her first dedicated comms manager. London office, ex-Treasury, ex-Stripe. Forty-five minutes. By the end of it I had a clear picture of the shape of the role: both internal and external communications, weekly catchups with Mati, a flat structure, and a reviewer who would recognise editorial values when she saw them and would also catch made-up numbers from a hundred yards away.

She gave me one line that ended up structuring the rest of the week. The brief was meant as a written test. But she said: "feel free to. I know you also like building stuff. So... It's written as a written test. But respond to the questions however you will."

I took that as permission. The brief came in from Oscar, the recruiter, on Tuesday afternoon. Three options. I picked two. Option 2: a social audit and 90-day plan, focused on X and LinkedIn (the two channels the brief named). Option 3: a creative storytelling piece on something ElevenLabs builds that's underexplained or underappreciated.

I emailed Oscar that evening. Tuesday 5 May, 10am, was the deadline I proposed. He confirmed two days later. By the time he confirmed I was already a long way in.

This is a record of what those seven days looked like from the inside. It's a private post because it's mostly process, partly post-mortem, and partly the kind of thing you don't necessarily want crawled into someone else's training data. If I share it, it's because we know each other.

The shape

The two halves of the brief turned into three parallel workstreams.

The first was the written part itself: an audit of what's working and broken on ElevenLabs' X and LinkedIn presence, then a 90-day plan for what to do about it. Brief, classic, the kind of thing comms candidates have submitted in some form for two decades. The risk with this kind of deliverable is that it reads as competent and forgettable. So the actual work was less "what do I write" and more "how do I make a written audit feel like it was made by someone who'd actually do the job, not someone applying for it."

The second was the storytelling piece. I picked cross-lingual voice. The reason is short: Mati himself called it "totally underhyped" on the Sequoia Training Data podcast a few episodes back. If the founder is naming a capability as the most under-told story in the company, and the role I'm applying for is the comms manager, the comms manager's first move is to amplify that signal. So I wrote a long-form essay called Mother Tongue, on the case for a polyglot internet. I wrote it in English. I had it translated into Japanese, Mandarin, Arabic, Polish and Ukrainian. I generated audio in all six languages using my own cloned voice. The article reads itself in your language while you read it on the page. The thesis demonstrates itself in the artefact.

The third workstream wasn't in the brief but the brief implied it. Angy had mentioned in V1 that ElevenLabs doesn't yet have a publishing rhythm: the social media is "really underutilised", four launches go out on the same day, there's no central calendar, no infrastructure, no single person responsible. That's the comms manager's job description. So the third workstream became a working demo of what an ElevenLabs broadcast page could look like in practice. A hub with a methodology page, a transcript of the most recent Cheeky Pint episode (Mati and John Collison talking voice agents), a LinkedIn carousel preview, a derivative gallery showing how one hour of podcast became more than ten outputs through one workflow. Plus voice tours in six languages, three honest easter eggs, OG images for both pieces. The whole thing built and shipped at /elevenlabs-broadcast, noindex,nofollow, never meant to be public, designed to look like an internal alpha.

Three parallel workstreams. One submission. Seven days. Tuesday to Tuesday.

Why three streams instead of one

It's worth saying out loud that this was probably more than was strictly required. The brief asked for an audit and a plan. Option 3 was an alternative, not an addition. Most candidates would have done one or the other.

I did both, plus a working build, partly because Angy had explicitly said I could "respond to the questions however you will", and partly because the role itself isn't a written-test role. It's the first dedicated comms hire at a company at $11B valuation that's growing faster than its publishing infrastructure. The actual job, day one, is going to be: figure out what to ship, get it shipped, repeat. The written test is just the application form for that. The thing that proves I can do the job is doing a small version of the job.

So the three workstreams were three different proofs:

The audit and plan prove I can do the analytical-strategic written work the role still requires.
The hub proves I can ship a piece of editorial infrastructure end to end, on my own, with brand fidelity, in a week.
Mother Tongue proves I can identify a story underneath the company's surface and tell it in a form that's harder to do than it looks.

That last one needed its own thing because the whole point is that the polyglot internet argument doesn't land if you describe it. It lands when you experience it. Reading an article about cross-lingual voice in English is unconvincing. Listening to that same article in Japanese in the author's voice, when the author can't speak Japanese, is the argument made flesh. That's not a deck thing or an audit thing. That's its own piece.

Workstream A: the audit

The audit is one of those documents that everyone who has ever applied for a comms role has produced some version of, which is exactly why it's hard to write a good one. The default mode is generic. The default tone is "consultative report". The default register is "I have audited your social media presence and here are my recommendations" which makes the reviewer's eyes glaze over by the second page.

What I did instead: framed it as three calls. See, Cut, Grow.

See what's working. Cut what isn't. Grow what's underused. The verdict at the top, on its own, in an orange-bordered editorial poster: "ElevenLabs broadcasts. It does not yet publish."

That single line is the whole argument compressed. ElevenLabs has the audience, the assets, the founder voice, the recurring programmes. What it doesn't have yet is a publishing rhythm. Broadcasting is one-way: you push announcements out and hope they land. Publishing is a relationship: you commit to a cadence, your audience builds an expectation, and the content compounds against that expectation. The diagnosis is that ElevenLabs has all the inputs of a publisher and none of the patterns. The plan that follows is how you turn the first into the second.

The structure was five sections with mono-numeric §01 to §05 badges (a small typographic touch lifted partly from Stripe Press's house style and partly from Mother Tongue's own visual register, which I'd been sketching in parallel). What we saw. What to cut. What to grow. What the brand sounds like. Diagnosis.

The body was eight numbered findings (three / three / two), each tied to a specific piece of evidence. I built three SVG charts inline: founder voice versus brand voice (Mati's personal LinkedIn outperforms the brand by 7x on identical content), thread drop-off (audiences collapse 80 to 95% by tweet two on every multi-post launch), distribution asymmetry (the Maitri Mangal Instagram collab did 47.3 million views; the Impact Program testimonials of recipients in 49 countries do 493, 707 and 931 views).

The last one is the one I kept coming back to during the week. ElevenLabs has a $1B in-kind commitment programme called 1 Million Voices. Seven thousand people supported. SXSW docuseries narrated by recipients themselves. It's a remarkable initiative. The YouTube videos of those people speaking in their own voice get triple-digit views. Meanwhile a creator collab with a single influencer hits 47 million views. That's not a problem of effort or quality. That's a distribution problem. Which means it's a comms problem. Which means it's exactly the kind of thing the comms manager exists to fix.

Brand wit fires once a year. The audience is here for it the other 364 days too.

April Fools is the only verifiably annual brand-wit moment. The Maitri collab is February. So the brand has at most two days a year where it actually publishes the way it can publish. The rest of the year is pure broadcasting.

The diagnosis section closes with a comparison table of how four voice AI competitors publish:

Cartesia publishes through its founders.
Hume publishes through its founders.
OpenAI publishes through the model.
ElevenLabs publishes through the launch calendar.

The first three have an editorial spine. ElevenLabs has a release schedule. The plan that follows is how you build the spine.

Workstream A: the plan

The audit was 1,300 words. The plan was 5,500. The size difference matters because the plan is where the comms manager lives. The audit ends; the plan is the job.

I structured it as Kill, Fill, Build. Three phases, days 1 to 30, 31 to 60, 61 to 90.

Phase 1, Kill. What stops getting shipped. The constraint isn't what to ship; it's what to stop shipping. Kill the four-launches-on-one-day pattern. Kill the threaded tweets that cost 90% of the audience after the hook. Kill the white-card customer announcements that all look identical because they all use the same template. Kill the silence on reactive moments (the four-day window during the Senator Hassan letter where ElevenLabs published nothing on X main; competitors and policy people filled the space).

Phase 2, Fill. What gets shipped instead. A central editorial calendar, with weekly recurring slots: ElevenHacks weekly hackathon recap (the dev account does this; the main account never amplifies it), Mati's Note (the founder voice, 5x weekly to LinkedIn since that's where his content actually performs, edited by the comms manager), 1 Million Voices fortnightly storytelling (each story becomes its own piece, not a generic announcement). Same-day reactive capability (within hours, not days). A founder onboarding spec for new interviews (the question Angy raised: "how does Mati talk about whatever?", solved by having a doc).

Phase 3, Build. The new programmes that didn't exist yet. Ten of them, in a card grid in the deck, each with phase tags (P1/P2/P3 to show when they ship). A builder spotlight series. A monthly editorial wildcard. ElevenLabs Wrapped at year-end. A voice guide for cross-lingual writing. A "restraint as content" frame (less launching, more publishing). Postcards from the Summit. A voice agent that interviews back. Mati's notebooks (a recurring "what I learned this week" piece). Easter eggs aimed at journalists. Plus a dedicated journalist outreach list, 25+ named contacts across seven tiers (enterprise tech, business, AI policy, creator economy, public service, audio media, trade press) replacing the obvious eight publications candidates usually default to.

The plan also had a risk section. Five risks, with mitigations: launch fatigue, founder bottleneck, brand-vs-founder voice conflict, compliance friction with reactive comms, calendar overload. Each had a MITIGATED orange pill if I had a real answer for it, which I made sure I did before shipping.

The constraint is not what to ship. It is what to stop shipping.

The closing line, after a four-paragraph breathing section across Kill, Fill and Build: "By Day 90, ElevenLabs publishes."

Workstream A: the through-line

The thing I'm proudest of in the written piece is the through-line. See, Cut, Grow in the audit becomes Kill, Fill, Build in the plan. The framework rhymes but doesn't repeat. The verbs sharpen as the document moves from diagnosis to prescription.

That kind of small structural rhyme is a signal of editorial thinking. It tells the reader, without saying it, that the audit and the plan are one continuous argument, that the same person wrote both, that they were thought through together. It's the kind of thing comms work either has or doesn't have. There isn't a methodology for producing it. You either notice the option to do it or you don't.

I noticed. It took most of Sunday afternoon to weave it through.

Workstream B: Mother Tongue, the article

The Option 3 piece needed to be one specific thing: an essay you couldn't have written without being a person who'd actually used ElevenLabs to build with.

The thesis came together over a Wednesday afternoon walk. The polyglot internet is the bigger argument hiding inside cross-lingual voice. Mati calls cross-lingual "totally underhyped". Four voice agents in (Buggy Smart, With Moshi, First Order, Queue Index, all built and run on ElevenLabs in the first four months of 2026), I agreed. The argument was: voice was never going to stay in English; the internet's next chapter is multilingual by default; the company that builds the polyglot layer wins the next decade.

The trick of the piece is that it's structured to make the reader feel that argument, not just read it. Seven sections plus an editor's note. The first section sets the historical context: 49.6% of web content in English today, down from 55% three years ago, with English having compounded into web infrastructure since ASCII in 1963. The second section explains the technical shift: cross-lingual modelling, where a single base model serves dozens of languages with marginal cost per language collapsing toward zero. The third section gives the proof: Klarna, Revolut, Ukraine's DIIA, all running production voice infrastructure in 30+ languages today. The fourth section is the inflection: this is what changes. The fifth section answers who wins (ElevenLabs, by language coverage). The sixth section is ten truths. The seventh section is the personal close.

The personal close is the one that took the longest to write. I'd wanted to learn Japanese since I was sixteen. Over fifteen years later I signed up for an evening course at CityLit. I needed to speak with the hotel reservation team at the Park Hyatt Tokyo bar (the one from Lost in Translation) to make sure my engagement to my wife was meticulously arranged. I went every week for months and built a small library of phrases. I love trying them out at Japanese restaurants. The waiters are always surprised when I speak to them in their language.

But I still play Japan like a tourist. Bookshops I can't read. Menus I have to point at. Dinner conversations I half-follow through someone else's translation. The title of that film has been my running metaphor for the country ever since.

If someone asked me which language I'd choose to read, hear, write and speak the internet in, the answer would still be Japanese. So I'd stop being a tourist.

The pivot from the personal back to the thesis is one line: That answer is now a question with a deadline. Voice is the unlock. Cross-lingual quality is here. The cost has collapsed. The layer is being built.

Then: You can hear this in Japanese. In Mandarin. In Arabic. In Polish. In Ukrainian. The same words. My voice. None of which I speak.

Then the question: What language do you want the internet to speak to you in?

The answer is one word, displayed at the size of a building: Yours.

If the page works, that single word does what the whole article has been trying to do.

Workstream B: the audio rabbit hole

The article was going to be readable on the page in English. The harder question was making it listenable in six languages, in my voice, when I only speak one of them.

The first step was finding a test paragraph and translating it. I picked the meta-line from the close: "You can hear this piece in Japanese. In Mandarin. In Arabic. In Polish. In Ukrainian. The same words. My voice. None of which I speak." I subscribed to DeepL Pro (€8.99 a month, 30-day free trial, no data training), and translated it into the five non-English languages.

DeepL gave me clean translations. But when I read them carefully, I noticed something. The English source had the word "piece" in the opening sentence. In Japanese, Polish, Ukrainian and Arabic, piece carries an artistic or musical register. Hear this piece in those languages reads like hear this composition. That's not what the English meant. The English meant hear this thing you're reading. So I rewrote the English source to drop piece entirely. Re-translated. The translations cleaned up. All five languages naturally adopted the staccato five-fragment rhythm of the English. The editorial principle locked itself: natural phrasing per language. Not literal translation. Not parallel structure for its own sake. Each translation gets to be itself.

That principle ended up being the meta-comment of the whole article. The Mother Tongue thesis is anti-translation, pro-native-experience. So the article had to be translated by someone who'd dropped translation as a goal. Which is what DeepL plus this kind of editorial pass is.

Then the audio.

Mike's voice clone in ElevenLabs (ID hyhqfFYHAO4kiHdq7hSb, "Michael Litman", professional clone trained two years ago when the company was much smaller) generates English fine. Multilingual v2 default settings, stability 0.5, similarity boost 0.75, speaker boost on. That's the baseline.

I generated test paragraphs in all six languages overnight on the 28th of April. They sounded amazing. They sounded like me. Specifically. They had the cadence I have when I'm reading aloud, the slight hesitation before key words, the thing a colleague once described as "your voice sounds like you've thought about the sentence before you said it." I was delighted.

Then I ran each MP3 through ElevenLabs Scribe v1 (the company's own speech-to-text), comparing the transcription back against the locked DeepL translations.

The Japanese was wrong.

Specifically: the sentence 私の声 (my voice) was being transcribed as 私の家 (my house) by Scribe v1. I assumed it was a Scribe error, so I ran it through Scribe v1_experimental (a different model). That heard 私の経験 (my experience). Then I ran it through OpenAI Whisper. That heard 私のキャベ which isn't even a word in Japanese; it's a truncation. Three different STT models, three different incorrect transcriptions. Two-model agreement is the gold standard for STT verification. Three-model disagreement on the same word means the audio itself is genuinely ambiguous. The voice clone was mispronouncing 声.

This was a real problem because that line is the line. None of which I speak. The Japanese version had to be right.

So I dropped down through the model space. Tried v2 with stability 0.7 instead of 0.5; the Japanese drifted onto a different word (同じ言葉, the same words, became 同じ心, the same heart). Tried v3 (the newer model that supports 74 languages versus v2's 29). v3 transcribed cleanly. 私の声 came back correct in both Scribe models. But the voice fidelity was different. v3 sounded like me through a slight echo. v2 sounded like me in a room. Mike on a phone call versus Mike in person.

I spent most of Wednesday 30th of April A/B'ing v3 settings. Speaker boost on, off. Similarity boost 0.6, 0.75, 0.9. The eleven_turbo_v2_5 model as a fallback. None of them got both fidelity and accuracy. The trade-off was real: v2 sounded like me but mispronounced critical Japanese; v3 pronounced correctly but sounded like a slightly different person.

I locked v3 with similarity boost 0.9 for Japanese, kept v2 default for everything else, called it a mixed-model approach, and went to bed at three in the morning thinking the problem was solved.

The next day, Friday 1st of May, the parallel session that was building the voice tours for the hub came back with a different finding. In long-form prose (versus the short test paragraph), v3 sounded materially different from me. The same v3 setting that had been acceptable on a 13-second test was distinctly someone else on a one-minute paragraph.

So the answer wasn't a model swap. The answer was source preprocessing.

Japanese has a specific kind of failure mode in TTS: certain kanji and dates and acronyms misread because the model is making a probabilistic guess about pronunciation, and for some words the wrong probability wins. The fix is to rewrite the specific words in hiragana, where the pronunciation is one-to-one with the script. So I rewrote the source: 1963年 became せんきゅうひゃくろくじゅうさんねん (the year spelt phonetically). ASCII became アスキー (the Japanese-standard transliteration, which prevents the model from saying "asuck"). 母国語 (mother tongue) became ぼこくご (the same word in hiragana). 日本食レストラン (Japanese-food restaurant) became 日本料理のお店 (Japanese-cuisine shop) because the original was dangerously close to 日本酒 (sake) phonetically. 試す (to try) became 言ってみる (to try saying) because 試す sometimes substituted to 示す (to show). And critically, どれも私は話せない became どれもわたしは話せない (with hiragana watashi), because the clone occasionally substituted わし (a hard old-man's pronoun in regional Japanese) when わたし came up next to 声.

Each of these was a small surgical fix. But each had to be earned. I'd identify the candidate substitution, test the rewrite via Scribe and Whisper, confirm the substitution had stopped, listen to confirm the audio didn't sound robotic. Eight rewrites total. Then I regenerated the full Japanese, all eight sections, 22 minutes 47 seconds of audio, on v2 stability 0.7 with the preprocessing applied.

Scribe verification: zero bugs in §1, zero bugs in §7. Listening test: sounded like me. The fidelity-accuracy trade-off was solved by leaving the model alone and editing the source.

That work took a full day. It would have been faster to ship v3 and accept the slightly-wrong voice. But the whole article is about cross-lingual quality being good enough that you can't hear the seams. If the Japanese listener heard a stranger speaking, the argument failed at the threshold.

This is what I mean when I say the role isn't a written-test role. The thing I learned that day isn't on any deck. It was discovered in conversation between three Claude sessions running in parallel, each catching different pieces of the problem. One session was building the hub's voice tours and noticed v3's long-form drift first. One session was working on the article translations. One session was running the audio QA. They cross-pollinated their findings. Source preprocessing was the answer none of them had on its own.

The eighteen learnings I logged from the cross-lingual audio work are listed at the back of the master bible. Some are technical. Some are editorial. The most useful one is the one most people working on cross-lingual would skip, because it sounds banal: "Sounding amazing" is not the same as "saying the right words." You can love the audio and ship it wrong. Always verify via transcription. Always.

Workstream B: the page

The Mother Tongue page itself was built across Friday 1st of May and Saturday 2nd of May. It deployed at /elevenlabs-thinking on Saturday evening.

The structure is editorial-long-read by default but with one specific listening interface at the top. Six language buttons (English, Polish, Japanese, Mandarin, Arabic, Ukrainian) form a tour-2 strip above the article. Click any of them and the audio version of the entire article in that language plays, with karaoke-style word-level synchronisation. 21,426 word timestamps in total, verified across the six audio files.

Visual layer: the hero is the title MOTHER TONGUE rendered in five scripts (Latin, Cyrillic, Arabic, Mandarin, Devanagari), staggered. Section 1 has a real WIRED Japan magazine cover (mine, the issue I was interviewed in). Section 3 has a real Toniebox photo (my son's, sky blue variant), then a voice-waves SVG diagram showing one voice resolving into six native scripts. Section 4 has a photograph of my wife and me at the Park Hyatt Tokyo bar where we got engaged. The section breakers between paragraphs cycle through six glyphs of the word mother in each language: 母 (Japanese) then 母亲 (Mandarin) then أم (Arabic) then matka (Polish) then мати (Ukrainian) then mother (English).

The big "Yours." at the close is its own visual moment. Picking a language morphs that word into the equivalent in that language: あなたのもの。 / 你的。 / لك. / Twoje. / Твоя. / Yours. The morph is on click. The page meets the reader in their language at exactly the moment the article asks them what language they want to be met in.

Editor's note typography: Rock Salt at 15px, line-height 1.85. Locked late on Thursday after I rejected Caveat as comic-sans-y. The drop-cap on §1 is Cormorant Garamond at 5.4em, terracotta, float-left. The audio-champion background under the listener strip is a Japanese persimmon-bordeaux gradient with soft-gold accents (locked Saturday afternoon after I noticed two consecutive dark sections felt heavy and the red ties to the article's Japan threading).

These details look small written out. But they accumulate into the texture that makes the page feel made by a person rather than generated by a template. Each one took fifteen minutes to half an hour to settle. There are probably forty of them. It adds up.

Workstream C: the hub

The hub at /elevenlabs-broadcast was the largest single build of the week. Half of Thursday, all of Friday, a big chunk of Saturday and another half of Sunday. By the time I finished it had 11 HTML pages, 12 voice tour MP3s, 8 easter egg MP3s, 2 OG images, 6 self-hosted General Sans woff2 weights, 3 official ElevenLabs SVG brand assets, and a methodology page that walked through the whole pipeline from source to derivatives.

The centrepiece of the hub was a dissection of the most recent Cheeky Pint episode. Cheeky Pint is the Stripe-produced podcast where John Collison interviews other founders. The most recent one before the brief landed was John interviewing Mati. So that became the test case for the publishing rhythm I was proposing in the plan.

One hour of conversation. I dissected it into more than ten outputs. A 1,420-word editorial blog post. An eight-tweet X thread. A 1,479-character LinkedIn post. A four-paragraph newsletter (Buttondown-ready). A 14-slide LinkedIn carousel (1080x1080 screens). A 10-slide Instagram carousel (different voice, different visual register, same source quotes). A six-card quote-cards set (alternating light and dark). A printable A2 poster centred on the wedding-vows quote (T6 in the locked quote bank). An 1200x1500 infographic with sourced numbers and customer logos. A Speakerdeck deck (14 slides, 16:9). A full searchable transcript page with chapter timestamps and a JS-based search filter. Plus eight production briefs for video and audio derivatives I couldn't build without editing tools (60-second kinetic typography video, audiogram, long-form chaptered YouTube cut, TikTok-specific cuts, YouTube Shorts cuts, et cetera).

The methodology page walked through the workflow. Tier 1 work (the hero artefacts: blog, carousel, transcript) takes 3 to 4 hours. Tier 2 (the auto-generated derivatives) takes 2 to 3. Total: 5 to 7 hours from one hour of source. The before/after comparison bar showed it against the manual equivalent (25 to 30 hours, approximately 4x slower). Caption: "~4x faster. Same outputs. One person, not a team."

I want to be clear about what this is not. This is not "use AI to mass-produce content". The pipeline I built doesn't generate text. It does the structural work: cutting the source into the right beats, formatting the carousel, generating the OG image, building the searchable transcript. Every piece of prose is mine. The X thread is hand-written; it has eight tweets because eight is the right number for that specific argument; the hook is a beat I tested aloud. What the pipeline saves is the second-half-of-the-day work where you'd normally turn a finished blog post into a carousel, a poster, a thread, a newsletter; the part of the work that's mostly mechanical reformatting. That part can be 4x faster. The thinking can't.

The hub also had three honest easter eggs.

Press M. Hold the M key from anywhere on the page. A modal opens. My voice plays a six-language hello: Hi, I'm Mike in English, then 私はマイクです in Japanese, then 我是麦克 in Mandarin, then أنا مايك in Arabic, then jestem Mike in Polish, then я Майк in Ukrainian. About nine seconds of audio. The reason this exists: the Mother Tongue thesis is that voice should travel naturally. The easter egg demonstrates it at hello-volume.

Hover the eleven. The wordmark at the top of the hub has a styled "11" in it. If you hover over that 11 and your cursor crosses it eleven times (the JavaScript counts), a different audio plays: Eleven on eleven. Nice find. Hi from Mike, hidden in the wordmark. About six seconds. The title attribute on the element gives a subtle hint: "Try hovering me a few times."

View source. A comment at line 15 of the HTML reads . Anyone who view-sources the page (the kind of person who would, which is the kind of person worth easter-egging) finds it. The hub footer also has a subtle line, 11px, opacity 0.5: "PS: three easter eggs hiding in this page. Press M is the easiest. View source if you're curious."

There was originally a fourth easter egg called Founder Mode, where long-pressing a specific area would have triggered a Mati-voice clip. I dropped it on Friday morning. We don't have Mati's voice clone. Cloning his voice without consent would be sketchy ethically and almost certainly violate ElevenLabs' own consent rules. The original easter egg had been claiming to play Mati's voice while having no audio file attached. I'd built a fabrication into the artefact and not noticed for a week. I rewrote the eggs to be honest. Mike's voice. Real audio files. Accurate copy. Three eggs, not four.

That fix was one of those moments where the work becomes about its own ethics for half an hour. I'd built the fake Mati audio not deliberately but because I'd written a placeholder description that I never came back to. I'd shipped placeholder text from a draft. The lesson I took out of it: ship the audio before you ship the description, or the description will lie. It's the same principle as not writing copy that includes specifics without first listing the specifics. If you write the frame before the substance, the frame becomes the specification, and the substance comes in to fit it, and any gaps in the substance get papered over by the frame.

Workstream C: the late pivot

I want to write about the hub repositioning honestly because it was the most uncomfortable hour of the week.

By Sunday evening the hub was finished. Three sub-pages live (audit, plan as standalone scrollable HTML, plus the methodology page, transcript, carousel and all the Cheeky Pint derivatives). All voice tours wired up. Easter eggs honest. Mobile clean at 375px. Contrast scan passed at 19+/22 on every section. The submission email was drafted and pointed to the hub as the canonical entry point: visit the hub, find the audit and plan from there, plus everything else as extra credit.

I'd been getting feedback from a friend (Than, ex-Burberry, ex-Apollo, a strong editorial taste) throughout the week. On Monday morning, less than 24 hours from send, he came back with one line: "Is this homework or a brand mockup?"

That's the kind of feedback that, when you read it the first time, you want to push back on. The hub was deliberately ElevenLabs-branded; that was the point. It was the working demo of what an ElevenLabs broadcast page could look like. It was meant to look like a brand mockup.

But the second I read it, I understood what he meant. The submission was meant to answer the brief. The hub was answering a different brief: what would an ElevenLabs broadcast page look like. If a reviewer landed on the hub first, they'd see a brand mockup, not the deliverables. The audit and plan were inside the hub. They weren't the thing they encountered first. Which meant the homework, in some real sense, was hiding behind the proof of work.

The fix was structural and cost me half a day on Monday and the early hours of Tuesday: I built a new preface page at /elevenlabs-submission. Clean editorial layout, max-width 760px, no brand mockup. Header eyebrow "For Angharad Knill". H1 "The submission, in one place." A voice intro at the top, where my voice clone says hello to Angy and orients her in 50 seconds (with "An-ghee" written in the audio script to force the hard G; you'd be amazed how many TTS models say "Annie"). Then a lede, a bullet list of what's in Option 2 and Option 3, two preface paragraphs threading the See/Cut/Grow language into the Kill/Fill/Build language so the reader sees the through-line before they read either piece. Then three deliverable cards: PART 01 The audit / PART 02 The 90-day plan / OPTION 03 Mother Tongue. Then a footer with the Mike-as-builder thesis and a "Not an official ElevenLabs property" disclaimer.

The hub stayed live but became unlinked from the submission. Methodology page links stripped from the hub UI (nav, hero CTA, methodology block CTA, band CTA, footer). The five fabricated video cards in the Watch section that I'd shipped as illustrative placeholders got removed (only the one real Cheeky Pint episode card retained). The Subscribe form got a label clarifying it as illustration.

The submission email was rewritten too. Instead of pointing to the hub as primary, it now pointed to the preface. The hub was demoted to "extra credit", a phrase I lifted from the original brief itself, where Angy had said something to the effect of respond however you will.

The fix isn't to do less work. The fix is to put the answer first and the proof of work behind it.

I was tired enough on Monday afternoon that I almost shipped the original hub-first version. The cost of the rebuild was real (I'd spent four days on the hub assuming it was the front door; rewiring it as the back door took five hours of attention I didn't have). But the cost of shipping the wrong front door is much higher.

The B3 decision

Earlier in the week I'd built a deck version of the audit and plan. It was 31 slides, ElevenLabs-branded, a full visual walk-through of the whole argument. Sharp visuals, real charts, a pull-quote wall, a proper close.

I'd been going back and forth on whether to send the deck with the submission. Three options:

Option B1. Send everything. Hub plus written pieces plus deck plus Mother Tongue, four URLs in the email body.

Option B2. Send everything but the deck. Hub plus written pieces plus Mother Tongue, three URLs.

Option B3. Send the written pieces and Mother Tongue. Hold the deck back. Only send it if Angy engages: replies, asks a question, schedules a call.

I went with B3 and locked it on Saturday morning.

The reasoning: Angy is busy. She's reviewing N candidates' submissions. The amount she actually reads on first pass is probably 30 minutes, maybe less. If I send four URLs, the cognitive load of the submission email itself rises. She's likely to skim the lede, click whichever URL feels least intimidating, and form a first impression from there. If the first thing she clicks is the deck, she gets a maximalist artefact that feels like overkill. If the first thing she clicks is Mother Tongue, she gets a piece of editorial work that demonstrates the core thesis.

So: send what answers the brief most cleanly. Hold what proves enthusiasm in reserve, for the moment she signals she's interested. The B3 follow-up doc has four pre-drafted reply variants (V1 positive reply, V2 specific question, V3 call schedule, V4 silence-nudge) and a decision matrix that picks which deck version to send based on her engagement signal.

There's a deeper editorial point underneath the B3 decision. The submission isn't transactional. It's the start of a conversation. If you send everything in the first email, you've spent your whole hand on the open. If you hold a card back, you have something to play later, something that gives you a reason to follow up that isn't a check-in.

The deck got further refined after the B3 decision. I split it into an audit-talk (focused on what to fix, 17 slides) and a plan-talk (focused on what to build, 30 slides). Then I built -full versions of each (20 slides and 34 slides) with extra value-add content I could send if her question was technical and detail-oriented. PDFs of all four. The decision matrix in B3-TALK-FOLLOWUP.md spells out which version goes in which engagement scenario.

The deck won't be sent unless she engages. If she doesn't engage, it doesn't ship.

The middle of the week

Some honest texture. The middle of the week is where most of the friction lives.

Wednesday: working from a coffee shop in Highgate while my son was at nursery, drafting the audit body, hitting the wall on the See/Cut/Grow framing because I couldn't get past the temptation to make the audit a generic competitive analysis. Took a walk on the Heath. Came back and rewrote the verdict in twenty minutes.

Thursday: the 14 files shipped overnight by the parallel session that was building the hub. I woke up to a folder I didn't have when I went to bed. Spent the first hour mobile-checking everything at 375px. Found three small overflow bugs (reviewer-banner .dismiss button was 9x18 not 44x44, modal-shade was overflowing 4px from missing box-sizing, nav-toggle was 8px padding for a 38x38 minimum). Fixed all three in the morning. Then the cut-decisions draft and URL structure draft from the Opus-tier sub-agents arrived. Locked the URL structure as B (everything under /elevenlabs-broadcast/ with extensionless URLs via redirects).

Friday: all-day audio model trauma (described above). Plus the 23-outputs sweep. The hub's marketing copy had been claiming "23 distinct outputs" for the Cheeky Pint dissection. By the time I'd actually built it I'd shipped 11 derivatives plus 8 production briefs. Calling that 23 was overstating. Replaced "23 outputs" with "more than 10 outputs" / "10+" everywhere it appeared (5 files, 18 instances). Plus the Press M ethics fix.

Saturday: the day everything had to ship live. Hub deployed (commits 5df89bd, c0a187d, 5f3ae58). Mother Tongue deployed. Modal box-sizing fix for the 4px mobile overflow. Mother Tongue's per-card audio-langswap hidden because it was cluttering the now-playing stub. Speed buttons reduced from 3 to 2 because three didn't fit in the 74px stub. Cinema-bar (the 28px black bar at top and bottom of viewport when audio plays) hidden because it was visually heavy and added nothing. By Saturday midnight, both submission halves were live with 200 status codes on every URL.

Sunday: the deck splits. The decks went from one 42-slide combined version to four versions (audit-talk 17, audit-talk-full 20, plan-talk 30, plan-talk-full 34) plus the legacy combined 42 still kept for backward compat. Joanna Stern correction (I'd had her at WSJ; she left in Feb 2026 and is now NBC News plus her own venture New Things; corrected in 11 places). Mobile fix for the editorial calendar table on the plan decks (24px overflow at 375px; added @media (max-width: 768px) responsive table CSS). 8-task QA sweep ran at 16:00 (0 em-dashes across 4 decks, 0 banned words, claims check PASS, noindex,nofollow, mobile 375px clean). SHIP-STATE-FINAL.md captured at 16:30.

Monday: Than's feedback. Hub repositioning. Preface page. Voice intro for Angy. Hub demoted.

Tuesday morning: two sessions corrupted on resume. Rebuilt from SHIP-STATE-FINAL. By 10am I was already an hour over the original deadline because the deadline had been extended (I'd asked Oscar for the extension Monday evening, knowing the late repositioning would cost me the morning).

Tuesday afternoon: through-line work. "See, Cut, Grow" to "Kill, Fill, Build" woven across all four pages. Spelling sweep "programme" to "program" for ElevenLabs brand consistency (US spelling). Voice Guide timing moved from Week 2 to Week 4 because Week 2 was too optimistic. "Lede" to "lead" in the metrics section (American journalism spelling on a UK piece, caught it).

Tuesday 18:30: sent.

Things I learned

I'm not going to do the usual essay move where the back third is "lessons". A few things I'll note instead, briefly.

Cross-lingual voice is harder than people writing about cross-lingual voice realise. The model is impressive. The voice clone is impressive. Translation quality is impressive. None of these guarantees that what comes out of the speaker is what you intended. The verification layer is not optional.

Three parallel sessions is the right number for a single-week sprint. Two is too few (you bottleneck on the dependencies between them). Four is too many (the cross-pollination overhead eats the gains). Three lets each session focus while sharing findings at well-defined seams.

Source-of-truth documents are load-bearing. Each of the three workstreams had a session bible. Above them sat a SHIP-STATE document that was the recovery layer for all three. When two sessions corrupted on Tuesday morning, the recovery wasn't from those sessions' state; it was from SHIP-STATE. Bibles aren't bureaucracy; they're the thing you can rebuild from when something breaks.

The proof-of-work / answer-to-the-brief distinction matters. Almost every editorial mistake I caught and almost every one I almost shipped came from confusing one for the other. Putting the answer first and the proof behind it is a discipline.

Ship audio before description. Or the description will lie about the audio. The Press M fabrication taught me this and I'm going to remember it.

Don't write specifics without listing them first. This was the rule that kept me out of trouble on the audit. Every cited number, before it went into the body copy, had to live in verified-data-bible.md first. If it wasn't in the catalogue, it didn't make the audit. By the end of the week the bible had 110 specifics and zero invented numbers.

The personal stake is what the reader actually feels. Mother Tongue could have been a good piece without the Park Hyatt close. With it, it's a different argument. Personal stakes don't make writing self-indulgent if they're earned. They make it readable.

What's next

I won't know if the work landed for a few days or a few weeks. The submission is sent. Round 3, if it happens, is a chat with Angy to "drill into the thinking", which I think is the part I'm most prepared for, because I've spent seven days inside the thinking. If she engages, I have B3 ready.

If she doesn't engage, I have a portfolio of work that's strong on its own. The audit and plan are something I'd send a future hiring manager. The hub is shippable as a methodology demo. Mother Tongue is publishable as an essay. The cross-lingual learnings are worth a piece of their own. None of it was wasted, even if this specific door doesn't open.

I'm writing this on Tuesday night, three hours after sending. The bibles I built across the week, fifteen of them at last count, just got consolidated into a single master bible in Obsidian. This post is going to live alongside them, unlinked, private, the build log of the week.

It was a lot of work. I don't regret any of it. Even the parts that didn't ship made the parts that shipped sharper. The audit reads the way it does because of the four times I rewrote the verdict. Mother Tongue closes the way it does because of the seven attempts I made at the final word before locking on "Yours." The hub is the back door now because of the hour I spent on Monday afternoon arguing with myself about whether Than's feedback was right.

If you're reading this in some future hiring conversation: this is what a week looks like when I take a brief seriously. If you're reading this because you're me and you forgot half of what happened by Friday, here it is. If you're Angy and somehow you found this private URL: hello again. Thanks for the brief. It made me build something I'm proud of.

The English internet was an accident. The polyglot internet is on purpose. So was this week.

Mike Litman, 5 May 2026, ~21:30 BST.