A Blog by Jonathan Low

 

Sep 12, 2025

As AI Gets Smarter, It Is Becoming More Expensive Rather Than Less

Concerns about the cost of AI have run rampant for over a year now and providers promises that the results are worth it have long begun to wear thin. 

Now, in addition to the investment required for soft and hard AI development, users are finding that, contrary to industry promises, the cost of actually using AI is rising. The reason is that as AI developer work to provide better answers, the more time, aka tokens, are needed, which raises costs. The question tech is beginning to face is at what price customers will determine that the outcomes do not justify the price. JL

Christopher Mims reports in the Wall Street Journal:

As AI got smarter, it was supposed to become too cheap (but) for apps that make software or analyze documents, bills are higher than expected—and growing. The latest AI are doing more “thinking,” when used for research, AI agents and coding. So while the price of a unit of AI, known as a token, drops, the number of tokens needed to accomplish many tasks is skyrocketing. Despite the drop in cost per token, many new forms of AI re-run queries to check answers, gather extra intel or write their own programs, before answering. AI agents will carry out a lengthy actions based on user prompts. They may deliver better responses, but can spend a lot more tokens in the process. While rate limits and dumber AI could help some AI-using startups, price hikes will drive customers away. And the really big players, which own their own monster models, can lose money while serving their customers

As artificial intelligence got smarter, it was supposed to become too cheap to meter. It’s proving to be anything but.

Developers who buy AI by the barrel, for apps that do things like make software or analyze documents, are discovering their bills are higher than expected—and growing.

What’s driving up costs? The latest AI models are doing more “thinking,” especially when used for deep research, AI agents and coding. So while the price of a unit of AI, known as a token, continues to drop, the number of tokens needed to accomplish many tasks is skyrocketing.

It’s the opposite of what many analysts and experts predicted even a few months ago. That has set off a new debate in the tech world about who the AI winners and losers will be.

“The arms race for who can make the smartest thing has resulted in a race for who can make the most expensive thing,” says Theo Browne, chief executive of T3 Chat.

Browne should know. His service allows people to access dozens of different AI models in one place. He can calculate, across thousands of user queries, his relative costs for the various models.

Penny wise, pound-foolish

Remember, AI training and AI inference are different. Training those huge models continues to demand ever more costly processing, delivered by those AI supercomputers you’ve probably heard about. But getting answers out of existing models—inference—should be getting cheaper fast. 

Sure enough, the cost of inference is going down by a factor of 10 every year, says Ben Cottier, a former AI engineer who is now a researcher at Epoch AI, a not-for-profit research organization that has received funding from OpenAI in the past.

The Cost of Doing Business

The price per token (for prompts and responses) for AI models at a given level of intelligence. The least “intelligent” models showed a roughly 9x decrease in cost per year, while the most capable ones dropped in price by roughly 900x per year.

$100.0

per million tokens

Mid-range 40x/year

Fastest 900x/year

10.0

GPT-3.5 Turbo level or better on general knowledge

Slowest 9x/year

1.0

GPT-4 level or better on Ph.D. level science questions

GPT-4o level or better on Ph.D. level science questions

Other benchmarks and performance levels

0.1

2022

'23

'24

'25

Note: Data are shown on a logarithmic scale.
Source: Epoch AI

Despite that drop in cost per token, what’s driving up costs for many AI applications is so-called reasoning. Many new forms of AI re-run queries to double-check their answers, fan out to the web to gather extra intel, even write their own little programs to calculate things, all before returning with an answer that can be as short as a sentence. And AI agents will carry out a lengthy series of actions based on user prompts, potentially taking minutes or even hours.

As a result, they deliver meaningfully better responses, but can spend a lot more tokens in the process. Also, when you give them a hard problem, they may just keep going until they get the answer, or fail trying. 

Here are approximate amounts of tokens needed for tasks at different levels, based on a variety of sources:

• Basic chatbot Q&A: 50 to 500 tokens

• Short document summary: 200 to 6,000 tokens

• Basic code assistance: 500 to 2,000 tokens

• Writing complex code: 20,000 to 100,000+ tokens

• Legal document analysis: 75,000 to 250,000+ tokens

• Multi-step agent workflow: 100,000 to one million+ tokens

Hence the debate: If new AI systems that use orders of magnitude more tokens just to answer a single request are driving much of the spike in demand for AI infrastructure, who will ultimately foot the bill?

Ivan Zhao, chief executive officer of productivity software company Notion, says that two years ago, his business had margins of around 90%, typical of cloud-based software companies. Now, around 10 percentage points of that profit go to the AI companies that underpin Notion’s latest offerings.

The challenges are similar—but potentially more dire—for companies that use AI to write code for developers. These “vibecoding” startups, including Cursor and Replit, have recently adjusted their pricing. Some users of Cursor have, under the new plan, found themselves burning through a month’s worth of credits in just a few days. That’s led some to complain or switch to competitors.

Amjad Masad speaking at the Wall Street Journal CIO Network Summit.
Amjad Masad is the founder and CEO of Replit, an AI service that helps coders and non-coders alike create apps. Photo: Bryan Bedder for The Wall Street Journal

And when Replit updated its pricing model with something it calls “effort-based pricing,” in which more complicated requests could cost more, the world’s complaint box, Reddit, filled up with posts by users declaring they were abandoning the vibecoding app.

Despite protests from a noisy minority of users, “we didn’t see any significant churn or slowdown in revenue after updating the pricing model,” says Replit CEO Amjad Masad. The company’s plan for enterprise customers can still command margins of 80% to 90%, he adds.

 

Some consolidation in the AI industry is inevitable. Hot markets eventually slow down, says Martin Casado, a general partner at venture-capital firm Andreessen Horowitz. But the fact that some AI startups are sacrificing profits in the short term to expand their customer bases isn’t evidence that they are at risk, he adds.

Casado sits on the boards of several of the AI startups now burning investor cash to rapidly expand, including Cursor. He says some of the companies he sits on the boards of are already pursuing healthy margins. For others, it makes sense to “just go for distribution,” he adds. Cursor didn’t respond to requests for comment.

One solution: dumber AI

The big companies creating cutting-edge AI models can, at least for now, afford to collectively spend more than $100 billion a year building out infrastructure to train and deliver AI. That includes well-funded startups OpenAI and Anthropic, as well as companies like Google and Meta that redirect profits in other lines of business to their AI ventures.

For all of that investment to pay off, businesses and individuals will eventually have to spend big on these AI-powered services and products. There is an alternative, says Browne: Consumers could just use cheaper, less-powerful models that require fewer resources.

For T3, his many-model AI chatbot, Browne is beginning to explore ways to encourage this behavior. Most consumers are using AI chatbots for things that don’t require the most resource-intensive models, and could be nudged toward “dumber” AIs, he says.

This fits the profile of the average ChatGPT user. OpenAI’s CFO said in October that three-quarters of the company’s revenue came from regular Joes and Janes paying $20 a month. That means just a quarter of the company’s revenue comes from businesses and startups paying to use its models in their own processes and products.

 

And the difference in price between good-enough AI and cutting-edge AI isn’t small.

The cheapest AI models, including OpenAI’s new GPT-5 Nano, now cost around 10 cents per million tokens. Compare that with OpenAI’s full-fledged GPT-5, which costs about $3.44 per million tokens, when using an industry-standard weighted average for usage patterns, says Cottier.

While rate limits and dumber AI could help some of these AI-using startups for a while, it puts them in a bind. Price hikes will drive customers away. And the really big players, which own their own monster models, can lose money while serving their customers directly. In late June, Google offered its own code-writing tool to developers, completely free of charge.

Which raises a thorny question about the state of the AI boom: How long can it last if the giants are competing with their own customers?

2 comments:

Masonryder said...

Race against players worldwide in Kart Bros IO, dodge obstacles, boost your speed, and dominate the tracks to become the ultimate kart champion!

joseanderson said...

Earning the C_HRHPC_2505 certification opens doors to rewarding roles in SAP SuccessFactors implementations and payroll management. Organizations value certified professionals who can configure, manage, and optimize cloud-based payroll systems efficiently. With guidance from Marks4sure, candidates improve their chances of certification success and stand out in a competitive job market. https://www.marks4sure.com/C_HRHPC_2505-exam.html

Post a Comment