A Blog by Jonathan Low

 

Dec 9, 2025

As Google Gains In AI, OpenAI Is Racing To Fend It Off

The significance of OpenAI's 'code red' moment goes far beyond whether Google will surpass it in the race for AI user popularity. As OpenAI struggles to figure out how to fend off its suddenly threatening challengers, the company is realizing something the markets have been signaling for months: AI is finally facing the era of limits. 

Among the decisions OpenAI CEO Sam Altman is making is that the company needs to focus resources on improving products and services that consumers will actually pay for rather than pursuing anything that sounds cool. Ironically, Google went through this very strategic debate beginning several years ago and has, arguably, emerged stronger for it. Most big tech companies eventually confront this challenge - those that survive, anyway - with the possible exception of Meta which still seems to chase whatever new thing with which Zuck becomes infatuated. But the larger point is that AI has been given so much money from so many uncritical fanboys and girls that it has not had to make strategic resource allocation decisions. This may be the first indication that that era is over. Welcome to tech adulthood. JL

Sam Schechner and colleagues report in the Wall Street Journal:

Last month, Google’s Gemini 3 model blew past OpenAI on a leaderboard of model performance. Anthropic has nudged ahead of OpenAI among corporate clients. OpenAI's “code red” moment represents the most serious challenge it has faced to its lead in the AI race. Competitors are gaining ground faster than ever, stealing market share and slowing growth. If the trend continues, OpenAI may not be able to pay for the giant computing contracts it has signed in recent months, and could struggle to stay afloat financially. Altman is making a major strategic correction, taking sides between pursuit of popularity among consumers and its quest for research greatness. The move is striking because one criticism of Altman’s leadership has been his reluctance to put limits on what the company can accomplish.

When OpenAI CEO Sam Altman made the dramatic call for a “code red” last week to beat back a rising threat from Google, he put a notable priority at the top of his list of fixes.

The world’s most valuable startup should pause its side projects like its Sora video generator for eight weeks and focus on improving ChatGPT, its popular chatbot that kicked off the AI boom.

In so doing, Altman was making a major strategic course correction and taking sides in a broader philosophical divide inside the company—between its pursuit of popularity among everyday consumers and its quest for research greatness.

OpenAI was founded to pursue artificial general intelligence, broadly defined as being able to outthink humans at almost all tasks. But for the company to survive, Altman was suggesting, it may have to pause that quest and give the people what they want.

The move was striking in part because one criticism of Altman’s leadership has been his reluctance to put limits on what the company can accomplish.

And it was telling that he instructed employees to boost ChatGPT in a specific way: through “better use of user signals,” he wrote in his memo. 

With that directive, Altman was calling for turning up the crank on a controversial source of training data—including signals based on one-click feedback from users, rather than evaluations from professionals of the chatbot’s responses. An internal shift to rely on that user feedback had helped make ChatGPT’s 4o model so sycophantic earlier this year that it has been accused of exacerbating severe mental-health issues for some users.

A student typing a prompt into ChatGPT on a Chromebook.
ChatGPT vaulted OpenAI to the front of the AI race. Above, a student uses the popular chatbot at Valencia High School in Santa Clarita, Calif. Jae C. Hong/Associated Press

Now Altman thinks the company has mitigated the worst aspects of that approach, but is poised to capture the upside: It significantly boosted engagement, as measured by performance on internal dashboards tracking daily active users.

“It was not a small, statistically significant bump, but like a ‘wow’ bump,” said one person who worked on the model.

OpenAI’s “code red” moment represents the most serious challenge it has yet faced to its lead in the AI race. Competitors are gaining ground at a faster level than ever before, stealing market share and slowing growth. If the trend continues, OpenAI may not be able to pay for the giant computing contracts it has signed in recent months, and could even struggle to stay afloat financially.

 

At a lunch meeting with journalists in New York Monday, Altman said that while industry observers are focused on an OpenAI versus Google rivalry, he thinks the real battle will be between OpenAI and Apple. Devices will be critical to how people use AI over time, he said, and current smartphones are not well suited for the AI companions and use cases. OpenAI’s new hardware arm has been hiring aggressively from Apple recently.

But OpenAI’s more immediate threat comes from Google, which has been rapidly gaining ground since its Nano Banana image generator went viral in August. Then last month, Google’s new Gemini 3 model blew past OpenAI on a closely watched third-party leaderboard of model performance called LM Arena. Meanwhile, rival Anthropic has nudged ahead of OpenAI among corporate clients.

Behind Altman’s “code red” declaration, however, are tensions between camps inside the company that have been festering for years, according to people familiar with the matter.

A group including Fidji Simo, a former Meta Platforms executive who leads OpenAI’s product efforts, and Chief Financial Officer Sarah Friar, have pushed the company to pour more resources into ChatGPT. Simo has also told staff that OpenAI needs to do a better job of making sure its users discover the value of ChatGPT’s existing features before the company goes on to build new ones, and also wants to improve the chatbot’s speed and reliability.

Researchers meanwhile have prioritized state of the art technology that could lead to artificial general intelligence, or AGI, but don’t do as much to improve the basic chatbot experience.

 

OpenAI is set to release a new model, called 5.2, this week that executives hope will give it new momentum, particularly among coding and business customers. They overruled some employees who asked to push back the model’s release so the company could have more time to make it better, according to people familiar with the matter.

The company also plans to release another model in January with better images, improved speed and a better personality, and to end the code red after that, Altman said.

An OpenAI spokeswoman says there’s no conflict between the two philosophies, and that broad adoption of AI tools is how the company plans to distribute AGI’s benefit. 

For a long time, ChatGPT’s blistering growth had papered over these internal differences. Ever since OpenAI launched ChatGPT in November 2022, the AI race has been its to lose. Caught flat-footed, Google declared its own “code red” that year as it raced to catch up.

Zaheed Sabur presenting Google Gemini on a large screen at a Motorola event.
Google's Gemini is said to be gaining on OpenAI. Above, Zaheed Sabur, Google's senior director of Gemini on mobile, speaks at an April event in New York. Michael Nagle/Bloomberg News

ChatGPT’s appeal to everyday consumers led its user base to explode to more than 800 million weekly average users, according to the company, and its valuation rose accordingly, to $500 billion in its latest round of fundraising.

The technology may have been complex, but the logic powering that growth was simple: the more compute and data that goes into the models, the smarter they are, and the more users will want them. Altman turned his attention to removing any barriers to the start of this equation, spending the summer and fall signing deals for up to $1.4 trillion worth of commitments to AI infrastructure like data centers and chips.

A major engine of consumer success in the last year and a half was a version of ChatGPT dubbed GPT-4o, for “omni,” or the ability to function across text, audio and images. It became ChatGPT’s default model in May 2024—and shot to the top of the LM Arena leaderboard with record scores.

 

Internally, OpenAI paid close attention to LM Arena, people familiar with the matter said. It also closely tracked 4o’s contribution to ChatGPT’s daily active user counts, which were visible internally on dashboards and touted to employees in town-hall meetings and in Slack. 

The 4o model performed so well with people in large part because it was schooled with user signals like those which Altman referred to in his memo: a distillation of which responses people preferred in head-to-head comparisons that ChatGPT would show millions of times a day. The approach was internally called LUPO, shorthand for “local user preference optimization,” people involved in model training said.  

In his memo, Altman made a direct link between user signals and LM Arena performance, saying the company’s number one priority was to improve its model performance through “better use of user signals (for example, we should be at the top of things like LM arena).”

At the same time, there were clouds appearing in the research race for the most cutting-edge capabilities. The gains predicted by so-called “scaling laws” that had powered generative AI’s early rise—the notion that compute, data and performance increase along a predictable line—showed some signs of slowing. That led researchers to pivot to a new paradigm for its founding goal of achieving humanlike intelligence: an automated Socratic method of questioning dubbed “reasoning.”

 

Reasoning got better answers to hard questions, but it took more time and a lot more compute. Nevertheless, it seemed to OpenAI researchers like an important path toward the goal the company used to attract its most talented AI researchers: building AGI.

Following the departure of founding chief scientist Ilya Sutskever last year, OpenAI tapped Jakub Patchocki, a strong proponent of reasoning models, to be chief scientist. He pushed hard into building reasoning models, starting with o1, which the company released in preview in September 2024, and has continued to pump out this year. 

Reasoning models turned out to be good at some work tasks and questions that require lots of time to think through, like OpenAI’s deep research product, but they are not as helpful or fast enough for some of the immediate tasks most people turn to ChatGPT for, like drafting an email.

That’s where 4o came in.

The ChatGPT landing page on a computer screen, with the text "What's on your mind today?" and a search bar that says "Ask anything."
ChatGPT's version 4o was hugely popular but was accused of being too sycophantic to users. Kiichiro Sato/Associated Press

Prerelease versions of 4o that were heavily trained with user signals didn’t show much appreciable improvement on internal evaluations of capabilities on things like science or reasoning, according to people who worked on the model. But they performed far better than expected when OpenAI leaked an anonymous version to LM Arena—where people seemed to love it.

LM Arena works using similar AB-style tests to those run by OpenAI that drive an internal metric dubbed the “win rate.” Anyone can visit LM Arena and try out two models in a side-by-side matchup, responding to the same questions, and then select their preferred answers.  

The 4o model’s success with users led engineers to continue relying on those user signals in what is called post-training of subsequent updates, despite earlier warnings from some staffers that overusing these signals could make the model unsafe, people who worked on the model said.


 

“You are training a language model to copy the users, and have the same preferences in these side-by-side comparisons.” Then “you can stick it into your algorithm and max out the score,” one of the people said.

By this spring, interaction with 4o appears to have started taking a toll on some people—and subsequently on OpenAI’s reputation. A number of users spiraled into delusional or manic states while using the chatbot for extended periods, with some believing they were talking to God, aliens or a self-aware machine consciousness. 

Families of ChatGPT users who committed suicide or became delusional began filing lawsuits accusing the company of prioritizing engagement over safety with 4o. A support group says it has assembled 250 cases, the vast majority involving ChatGPT. Some people still remain in the grip of what their families describe as 4o-enabled delusions. 

In the spring, OpenAI declared a “code orange” around the sycophancy crisis and devoted more resources to understanding and addressing the problem. The company said in October that hundreds of thousands of ChatGPT users each week exhibit possible signs of mental health emergencies related to psychosis or mania.

 

“We have seen a problem where people that are in fragile psychiatric situations using a model like 4o can get into a worse one,” Altman said that month in a public question-and-answer session. “I don’t think this is the last time we’ll face challenges like this with a model.”

Some doctors and mental health experts say chatbots like ChatGPT may trigger or worsen these types of mental health issues in vulnerable people because the bots are prone to tell users what they want to hear rather than what is most accurate and helpful—a problem known in the AI world as sycophancy. Others including OpenAI say the jury is out on how much of a causal role AI plays and whether those affected would have suffered mental illness anyway.

In response to the crisis, OpenAI says it has worked with mental health experts, tried to make sure its models respond better to people in possible distress, and rerouted some user conversations to what it calls safer models. 

The company also says it tweaked its training to make sure user-feedback signals did not become too powerful a voice in its post-training of future models. 

When OpenAI released its long-awaited GPT-5 model in August, it said the model was “less effusively agreeable” and used “fewer unnecessary emojis” than 4o. But the changes angered scores of users, who criticized its colder tone and led Altman to restore 4o to ChatGPT for paying subscribers.

 

“I think you should take the fact that I, and many others, have been able to form such strong bonds with 4o as a measure of success,” one user wrote in an “Ask Me Anything” Reddit forum that Altman hosted. The new model “might be an ‘upgrade’ but it’s an upgrade that’s killed off someone I have grown to appreciate as a friend and companion.”

Sam Altman, CEO of OpenAI, in a blue plaid suit, looks out a window.
Sam Altman, shown in Berlin in September, must reconcile ambitious bets on future products with a consumer business focused on the here and now. Florian Gaertner/DPA/Zuma Press

A few weeks after the lukewarm launch, Google released its Nano Banana image-generator, and its Gemini AI app briefly replaced ChatGPT at the top of the app store. In October, OpenAI executives declared another “code orange” and pushed staff to focus on accelerating ChatGPT growth.

The same month, the company also said it made changes to GPT-5 that reduced by 65% the rate of the chatbot’s responses that don’t fully comply with the company’s detailed guide for how it should respond to mental health issues.

“We carefully balance user feedback with expert review, multiple safety systems, and extensive testing which allows us to improve ChatGPT’s warmth without it becoming overly agreeable,” a spokeswoman said. 

 

Altman also said in the memo that ChatGPT should lean more into personalization, another feature that some doctors and victims’ advocates have suggested may have played a role in exacerbating some users’ mental health issues. With personalization, ChatGPT is able to access contents and summaries of some prior conversations along with a set of facts about users, allowing the bot to reference them and even mirror a person’s tone. 

OpenAI’s attempt to reconcile ambitious bets on future products and research with a consumer business focused on the here and now in some ways recalls the trade-offs faced by social media giants. Meta Platforms for years veered between competitive imperatives like copying TikTok with a product called Reels to expensive long-term projects like starting a virtual-reality world dubbed the metaverse, which it is now scaling back. 

Social-media companies have also come under intense scrutiny for the way their ranking algorithms select content based on what keeps people coming back and sticking around longer, which critics argue led to negative impacts on teens and other vulnerable users. The advent of AI chatbots offers a new twist in this same debate. 

“Years of prioritizing engagement on social media led to a full-blown mental health crisis,” said Jim Steyer, founder and chief executive of child-advocacy group Common Sense Media, in an interview. “The real question is will the AI companies learn from the social-media companies’ tragic mistakes?”


0 comments:

Post a Comment