The Low-Down: Token Usage Is Not An Impact Measure: the Free-Spending AI Binge Is Over

Reports from large corporate users of AI indicate the beginning of an evolutionary change in their approach. Until now, consumption as calculated by token usage was the metric of choice. Eg, if we're using AI, good things will happen. But surveys now reveal that massive AI spending has not only failed to deliver earnings, companies cannot even figure out how to measure gains - or their absence.

The result has not been a panicked pull-back, but the emergence in smart, forward-looking organizations of a more disciplined approach that seeks to test AI impact on desired outcomes, with subsequent, related adjustments not just in spending, but - have we been here before, tech users? - in how teams are organized, incentivized and led. Rinse and repeat: there is no such thing as plug and play. Optimized impact requires experimentation, iterative re-alignment and, yes, investment in the much-maligned 'soft stuff' like how to actually make a potentially disruptive new technology work effectively inside a large, complex, already fully functioning enterprise. Better late than never. JL

Anthony Lopopolo reports in Quartz:

For the past two years, the prevailing logic inside large tech companies was: the more AI, the better. Budgets were open-ended, consumption was the metric; anyone asking hard questions about returns was missing the point. That logic is breaking down. The free spending phase is over. A survey found 80% of respondents weren't seeing impact on earnings from AI. Even with full AI automation, output gains would be capped at 26% without changes to how teams review, test, and ship code. The companies now building governance frameworks around AI (could) capture returns the consumption-first approach never delivered. The companies getting the most from AI spending stop measuring consumption and start measuring returns. The tooling to do that exists. Doing it before caps calcify into budgets will determine whether this marks disciplined AI investment or just the end of the free-spending phase. "Token usage alone is not a measure of impact."

For the past two years, the prevailing logic inside large technology companies was simple: the more AI, the better. Budgets were open-ended, consumption was the metric, and anyone asking hard questions about returns was missing the point.
That logic is breaking down. Uber $UBER +0.78%, Microsoft $MSFT -1.77%, and Meta $META +1.03% have all moved in recent months to cap, redirect, or scrutinize AI spending in ways that would have been unthinkable during the tokenmaxxing peak. The free-spending phase is over.
What replaces it matters. The companies now building governance frameworks around AI have a real opportunity to capture returns that the consumption-first approach never delivered. The ones that just install spending caps and call it discipline may end up with less to show for their AI investment, not more.

The pullbacks are deliberate, not panicked
Uber burned through its entire 2026 AI budget in four months. The company's response was not to ban AI coding tools but to impose a $1,500 monthly cap per employee per tool, tracked through an internal dashboard, with an approval process for exceptions. It's one of the clearest examples of a pattern now visible across the largest technology companies.

The instinct is to read spending caps as austerity. The details suggest otherwise. Uber's cap applies separately to each tool, meaning a developer who uses both Cursor and Claude Code gets a $1,500 allowance for each, according to Bloomberg. The system is designed to force a conversation about cost, not to eliminate AI use.
Microsoft's decision to pull back employee access to Claude Code follows a similar logic but with an added strategic dimension. According to The Verge, the company's Experiences and Devices division is phasing out Claude Code licenses by the end of June, directing engineers toward GitHub Copilot CLI. An internal memo from Rajesh Jha described the six-month experiment with Claude Code as intentional benchmarking. Microsoft introduced a competitor's product to expose the shortcomings of its own tool, collected feedback, and then consolidated. Financial considerations played a role, too. The cutoff coincides with the end of Microsoft's fiscal year.

Meta's CTO Andrew Bosworth went furthest in articulating the underlying principle. "Nobody should be using AI tools just for the sake of using them," he wrote in an April memo to employees, according to the Wall Street Journal. "All motion is not progress and token usage alone is not a measure of impact of any kind."
These are not companies retreating from AI. They are companies that spent months watching token consumption function as a proxy for productivity and decided the proxy was broken.
The measurement gap is the real problem
The difficulty is that most companies don't yet have a clear alternative metric. A McKinsey survey found that more than 80% of respondents said their organizations weren't seeing tangible impact on earnings from generative AI. Tracking well-defined KPIs for AI solutions had the biggest effect on bottom-line impact among the 12 adoption practices McKinsey tested, yet most organizations haven't done that work.

The numbers from MIT's NANDA initiative are starker. Its GenAI Divide report, published in August 2025, found that 95% of enterprise AI pilot programs failed to deliver measurable financial returns. The core issue was flawed integration. Generic tools worked well for individuals but stalled when embedded in organizational workflows.
An NBER working paper tracking more than 100,000 GitHub developers quantified the gap between AI activity and output. Coding agents produced a 741% increase in lines of code and a 65% increase in pull requests, yet actual software releases rose by only 20%. The researchers attributed the gap to human bottlenecks in the production chain, estimating that even with full AI automation, final output gains would be capped at about 26% without changes to how teams review, test, and ship code.
What structured governance looks like
The frameworks for measuring AI returns are emerging, even if adoption remains early. McKinsey's measurement model has five layers. Technical metrics, such as cost per query and system availability, sit at the bottom. Financial outcomes, such as operating profit and cost reduction, sit at the top. Business performance metrics connect the two. Each layer has a distinct owner and review cadence: the CTO owns the bottom, and the CFO owns the top.

BCG research found that companies it classified as "AI future-built" achieved five times the revenue increases and three times the cost reductions of other companies. The differentiating factor was not spending more on AI but scaling initiatives with clear outcomes, trained teams, and systematic measurement of returns.
Gartner's research points to a gap most organizations haven't closed. Policies establish intent but don't enforce behavior during live AI operation, where risks emerge dynamically. The organizations most likely to see returns are the ones that move from written rules to continuous monitoring and enforcement built directly into their systems.
The AI providers themselves are building the infrastructure for this. Anthropic's enterprise plans now include granular spend caps at the organization and individual user level, with usage analytics for Claude Code. OpenAI's ChatGPT Enterprise offers role-based usage limits, with admin alerts as the recommended default and hard caps reserved for exceptional cases. GitHub's shift to usage-based billing on June 1 replaced flat request counts with token-consumption measurement, making the cost of each interaction visible for the first time.

The risk of capping too hard
The counterargument to spending caps is real. A controlled experiment with GitHub Copilot found that developers with access to the tool completed a coding task 55.8% faster than those without. A GitHub study of 95 professional developers found the same magnitude of speedup.
The picture is more complicated for experienced developers on complex tasks. A METR randomized trial from early 2025 found that AI tools caused a 19% slowdown among experienced open-source developers, even as those developers believed they were working faster. A follow-up study starting in August 2025 with newer tools showed early evidence of speedup, with a point estimate of 18% faster, though the confidence interval still included the possibility of no effect.
The practical question is whether a $1,500 monthly cap is the right number, or whether any single number can be. The cost difference between models is vast. An analysis of current model pricing found that output tokens from Claude Opus 4.6 cost 83 times more than those from Llama 4 Scout. A developer running a frontier model for tasks a cheaper model could handle will hit a spending cap faster, with less to show for it, than one who routes work to the right model for each job.

That routing decision is where the real governance challenge lies. Caps are a blunt instrument. The companies that get the most from AI spending are the ones that stop measuring consumption and start measuring returns. The tooling to do that exists. Whether organizations adopt it before the caps calcify into permanent budgets will determine whether this moment marks the beginning of disciplined AI investment or just the end of the free-spending phase.