Open ChatGPT. Generate a Midjourney image. Clone your voice in ElevenLabs. Deploy a site on Netlify. Create a product video on Luma AI. Every one of those actions has a unit cost measured in tokens, credits, or compute units. Different names. Same logic. The entire AI stack has converged on a single pricing model, and most of the people using it every day have no idea.

This is not a coincidence. It is the business model.

What a token actually is

A token is not a word. It is not a letter. It is the smallest unit of meaning a language model can process: roughly four characters, or about three quarters of a word. A short email is around 300 tokens. A short novel is around 100,000. An API call costs you tokens in, tokens out, and everything in the context window in between. Every character you type, every word the model returns, every piece of context you pass: tokenised.

The reason this matters is that you pay for all of it. And the pricing is invisible by design.

It is not just the language models

The token economy did not stop at ChatGPT and Claude. Midjourney charges by the generation, pegged to compute. ElevenLabs charges by the character and by the voice. Luma AI charges by the second of video. fal.ai charges by the inference. Netlify charges by the compute unit. Every major AI platform, regardless of what it produces, has adopted some variant of the same model: you pay per unit of output.

This is new. The previous generation of software charged a fixed fee for access: a seat licence, a subscription, a monthly flat rate. The product existed whether you used it or not. The cost was decoupled from usage. That model is disappearing. In its place is metered billing: the more you use, the more you pay. The more the AI generates, the larger the bill.

The incentive to generate more is built in

Token-based pricing creates a structural incentive that runs directly counter to your interests as a user. The platform's revenue goes up every time you iterate, regenerate, extend a conversation, or ask for more options than you actually need. That is not a bug in the design. It is the mechanism. The tools that serve you are also the tools that bill you, and their interests are not aligned with yours when it comes to efficiency.

This is worth sitting with. Every time an AI keeps you talking, every time a tool offers six variations when you needed one, every time a model hedges instead of giving a direct response: there is a meter running. You are the customer and the revenue source simultaneously, and the product is optimised for the latter.

The tools that serve you are also the tools that bill you. Their interests are not aligned with yours when it comes to efficiency.

Subscriptions are designed to make you stop noticing

ChatGPT Plus. Claude Pro. Adobe Firefly credits packages. These exist, in part, to remove the per-token anxiety from the equation. When you pay a flat fee, you stop counting. You stop optimising. You start treating the tool as unlimited, which is exactly what the platform wants. The meter does not stop. The subscription just hides it behind a fixed monthly number. When you stop seeing the cost, you stop managing it.

Agents are about to make this dramatically worse

Everything above applies to conversational AI: one prompt, one response, one unit of cost. Agents operate differently. An AI agent does not prompt once and wait. It plans, executes, checks its work, spawns sub-tasks, self-corrects, loops, and repeats. A single conversational prompt might cost a few hundred tokens. A single agentic workflow can cost tens of thousands. In complex multi-step tasks, the multiplier can be 50 to 100 times what you would pay for a direct prompt.

Every major platform is pushing agentic AI right now. Anthropic, OpenAI, Google, Microsoft: all of them have agent frameworks either live or in preview. Most organisations have no idea what that will do to their token bills. The hidden cost is about to become significantly less hidden.

The hidden tax nobody is tracking

Here is what is actually happening across most organisations using AI: the spend is diffuse, invisible, and growing. Some teams have ChatGPT Plus seats. Others have API access billed to a shared account. Designers are using Midjourney. Marketing are using ElevenLabs. The product team is deploying on Netlify and spinning up inference endpoints on fal.ai. Video generation is happening somewhere. Nobody has a complete picture. It is not in any single budget line.

This is the invisible infrastructure cost of the AI era. The organisations that surface it, track it, and manage it as a deliberate line item will have a structural advantage over those that treat it as miscellaneous software spend.

I have spent eleven billion tokens building over twenty live AI products in ten weeks. I know this because I track it. 94% of those were cache reads, which means I pay a fraction of the full cost by pre-loading context and reusing it across thousands of API calls. The System pipeline I built runs four generators in parallel, produces a full strategy pack in roughly 105 seconds, and costs around 30 pence per run. That is not luck. It is the result of treating token spend as a real cost with real levers.

The difference between 94% cache efficiency and 0% is, at scale, the difference between a manageable infrastructure cost and an unmanageable one.

Token prices are falling. The challenge is growing faster.

The cost per token drops with every model generation. What was expensive in 2023 is cheap today. This is real and it matters. The floor is moving toward zero.

The problem is that usage is expanding faster than prices are falling. AI is not replacing expensive tasks with cheap equivalents. It is making entirely new categories of task possible, which are then adopted at scale, which generates usage volumes that more than offset the per-unit price decline. The bill goes up even as the rate goes down.

The moat is moving upward

Here is the thesis that sits underneath all of this. As the cost per token approaches zero, access to AI stops being the competitive advantage. The capability gap between organisations that use AI and those that do not will close. What remains when that gap closes is judgement: knowing what to ask for, how to ask for it, when to use a smaller model, what context to pass and what to leave out, how to cache intelligently, how to batch efficiently.

The organisations that develop those skills now, before intelligence is essentially free, will be better positioned when it gets there. The moat moves upward: from access to judgement, from infrastructure to taste, from having the tools to knowing how to use them well.

Start counting tokens. Not because it will feel like it matters right now, but because it will.