Mike Litman
Tokens Are the New Currency
A TALK BY MIKE LITMAN

Tokens are
the new currency

Everything in the AI stack now charges by the token.
This is not a coincidence. It's the business model.

First principles

A token is not a word.
It's not a letter.
It's the smallest unit of meaning
a model can process — and the thing
you pay for every single time.

Roughly 4 characters, or three quarters of a word. A short email is ~300 tokens. A novel is ~100,000. Every interaction has a price.

01
OBSERVATION 01

It's not just the LLMs

ChatGPT. Claude. Midjourney. ElevenLabs. Luma AI. fal.ai. Netlify. The entire AI stack is converging on the same pricing model: tokens, credits, compute units. Text, images, voice, video, hosting — it doesn't matter. What started with language models is now the universal logic of the AI era. Every tool. Every platform. Every API. The meter is running everywhere.

ChatGPT tokens Claude tokens Midjourney credits ElevenLabs credits Luma credits fal.ai tokens Netlify compute

Every time an AI keeps you talking.
Every time a tool generates more options
than you asked for.

There's a meter running.

02
OBSERVATION 02

The incentive is to generate more

Token-based models reward engagement. Longer conversations, more output, multiple iterations — these all mean more tokens spent. The tools that serve you are also the tools that bill you. That's not a bug in the design. It's the business model. Understanding this changes how you interact with every AI tool you use.

Input tokens Output tokens Context window API calls
03
OBSERVATION 03

Subscriptions are designed to make you stop counting

ChatGPT Plus. Claude Pro. Adobe Firefly credits packages. Flat-rate subscriptions exist to remove per-token anxiety. When you stop seeing the cost, you stop optimising. When you stop optimising, you spend more. The meter doesn't stop running. It just gets hidden behind a fixed monthly fee — which is exactly what the platform wants.

ChatGPT Plus £20/mo Claude Pro £18/mo No visibility on actual usage Optimisation stops when cost disappears

Every time you iterate,
regenerate, or extend a conversation,
someone else's revenue goes up.

That is not accidental.
That is the design.

11B

tokens consumed building 20+ live AI products
in 10 weeks. At 94% cache efficiency.

Jan–Mar 2026 · verified Claude usage data

04
OBSERVATION 04

Cache is the working memory of AI

Prompt caching lets you pre-load context once and run thousands of calls on top of it at a fraction of the cost. Without it, you pay full price to re-read the same document on every API call. With it, the first call costs. The next thousand barely do. This is where token efficiency compounds.

94% cache reads ~90% cost reduction on cached context Claude prompt caching
05
OBSERVATION 05

Context windows are cognitive real estate

How much a model can hold in a single interaction is measured in tokens. 200,000 tokens — about a short novel. The larger the context, the more coherent the output. But coherence has a cost. Every token in the window is a token you're paying for. Knowing what to include, and what to leave out, is a skill.

Claude: 200K context GPT-4: 128K context Gemini: 1M+ context
06
OBSERVATION 06

Token prices are in freefall

The cost per token drops significantly with every model generation. What was unaffordable in 2023 is table stakes today. The floor is moving toward zero. But usage is expanding faster than price falls. The organisations that start building token-efficient systems now will hold a structural cost advantage when deflation reaches the floor.

Price per 1M tokens falling each generation Usage expanding faster than price drops

The token budget is the
new media budget.

The organisations that manage it
like one will have an unfair advantage.

07
OBSERVATION 07

The hidden tax most finance teams can't see

Across your organisation, tokens are being spent right now. Team subscriptions. API bills. Netlify compute. Image generation credits. Voice synthesis. Video generation. It spreads across tools, teams, and budget lines — and no one has a complete picture. This is the invisible infrastructure cost of the AI era. Most finance teams have no idea what it's adding up to.

No single budget line Spreads across every team Invisible to finance Growing every quarter
08
OBSERVATION 08

Agents don't prompt once. They loop.

A single conversational prompt costs hundreds of tokens. An AI agent — planning, executing, self-correcting, spawning sub-tasks — burns tens of thousands per run. Often more. Every major platform is pushing agentic AI right now. Most organisations have no idea what that will do to their token bills. The hidden tax is about to get a lot less hidden.

Multi-step planning loops Self-correction cycles Sub-agent spawning 50–100x vs. single prompt
09
OBSERVATION 09

Token strategy is business strategy

Which model. How much context to pass. When to cache. When to run calls in parallel. When to use a smaller model for simpler tasks. These look like engineering decisions. They're economic ones. And they compound across millions of API calls. The System pipeline: 4 generators running in parallel, ~105 seconds end to end, roughly £0.30 per full run. That's token strategy working.

~£0.30 per pipeline run 4 generators, parallel Sonnet for ops, Opus for creative Model tiering by task
09
START HERE

Be precise, not conversational

Every vague prompt is a token tax you pay. Give AI a destination, not an invitation. Batch your requests — one well-constructed call beats five iterations. Match the model to the task: don't use a Rolls-Royce to go to Tesco. And stop regenerating before you've thought. The discipline of precision is now an economic skill.

Precise prompts Batch over iterate Right model, right task Think before you type
The punchline

As the cost per token approaches zero,
access to AI stops being the advantage.

The moat moves upward —
to judgement, taste, precision, strategy.

When intelligence is essentially free,
what you do with it is everything.

Mike Litman

Start counting
tokens.

mikelitman.me · hello@mikelitman.me

Continue from slide ?