Everything in the AI stack now charges by the token.
This is not a coincidence. It's the business model.
A token is not a word.
It's not a letter.
It's the smallest unit of meaning
a model can process — and the thing
you pay for every single time.
Roughly 4 characters, or three quarters of a word. A short email is ~300 tokens. A novel is ~100,000. Every interaction has a price.
ChatGPT. Claude. Midjourney. ElevenLabs. Luma AI. fal.ai. Netlify. The entire AI stack is converging on the same pricing model: tokens, credits, compute units. Text, images, voice, video, hosting — it doesn't matter. What started with language models is now the universal logic of the AI era. Every tool. Every platform. Every API. The meter is running everywhere.
Every time an AI keeps you talking.
Every time a tool generates more options
than you asked for.
There's a meter running.
Token-based models reward engagement. Longer conversations, more output, multiple iterations — these all mean more tokens spent. The tools that serve you are also the tools that bill you. That's not a bug in the design. It's the business model. Understanding this changes how you interact with every AI tool you use.
ChatGPT Plus. Claude Pro. Adobe Firefly credits packages. Flat-rate subscriptions exist to remove per-token anxiety. When you stop seeing the cost, you stop optimising. When you stop optimising, you spend more. The meter doesn't stop running. It just gets hidden behind a fixed monthly fee — which is exactly what the platform wants.
Every time you iterate,
regenerate, or extend a conversation,
someone else's revenue goes up.
That is not accidental.
That is the design.
tokens consumed building 20+ live AI products
in 10 weeks. At 94% cache efficiency.
Jan–Mar 2026 · verified Claude usage data
Prompt caching lets you pre-load context once and run thousands of calls on top of it at a fraction of the cost. Without it, you pay full price to re-read the same document on every API call. With it, the first call costs. The next thousand barely do. This is where token efficiency compounds.
How much a model can hold in a single interaction is measured in tokens. 200,000 tokens — about a short novel. The larger the context, the more coherent the output. But coherence has a cost. Every token in the window is a token you're paying for. Knowing what to include, and what to leave out, is a skill.
The cost per token drops significantly with every model generation. What was unaffordable in 2023 is table stakes today. The floor is moving toward zero. But usage is expanding faster than price falls. The organisations that start building token-efficient systems now will hold a structural cost advantage when deflation reaches the floor.
The token budget is the
new media budget.
The organisations that manage it
like one will have an unfair advantage.
Across your organisation, tokens are being spent right now. Team subscriptions. API bills. Netlify compute. Image generation credits. Voice synthesis. Video generation. It spreads across tools, teams, and budget lines — and no one has a complete picture. This is the invisible infrastructure cost of the AI era. Most finance teams have no idea what it's adding up to.
A single conversational prompt costs hundreds of tokens. An AI agent — planning, executing, self-correcting, spawning sub-tasks — burns tens of thousands per run. Often more. Every major platform is pushing agentic AI right now. Most organisations have no idea what that will do to their token bills. The hidden tax is about to get a lot less hidden.
Which model. How much context to pass. When to cache. When to run calls in parallel. When to use a smaller model for simpler tasks. These look like engineering decisions. They're economic ones. And they compound across millions of API calls. The System pipeline: 4 generators running in parallel, ~105 seconds end to end, roughly £0.30 per full run. That's token strategy working.
Every vague prompt is a token tax you pay. Give AI a destination, not an invitation. Batch your requests — one well-constructed call beats five iterations. Match the model to the task: don't use a Rolls-Royce to go to Tesco. And stop regenerating before you've thought. The discipline of precision is now an economic skill.
As the cost per token approaches zero,
access to AI stops being the advantage.
The moat moves upward —
to judgement, taste, precision, strategy.
When intelligence is essentially free,
what you do with it is everything.
mikelitman.me · hello@mikelitman.me