Aug 12, 2025

Why OpenAI's GPT-5 Rollout Is Not Going Well - And What That Implies

The world used to be relatively forgiving of new releases, assuming - usually correctly - that the bugs would be worked out, especially within the context of the amazing tasks whatever it was could already perform. No longer. 

First, the sheer scale of AI hype, underscored by the mind-blowing amounts of money being invested, leave less room for investor and customer tolerance. Add to that OpenAI's active pursuit of the "We're Number One' mantle leads to similarly inflated expectations. That it is still wildly unprofitable becomes yet another unspoken criticism. And with lots of competition only too happy to offer comparisons, the company cannot suffer too many more black eyes before serious questions about leadership emerge. That Sam Altman was forced to restore access to to the new model's predecessor (taken away so that clients had to use the new offering) suggests that these problems may not be solved quickly - and that the company knows it has a problem. JL
 
Carl Franzen reports in Venture Beat:

Checking how the model is faring with early AI adopters indicates a chilly reception. User reports have emerged since GPT-5’s release showing it erring badly when solving relatively simple problems that preceding OpenAI models — and rivals from competing AI labs — answer correctly. An automatic “router” that chooses a thinking or non-thinking mode for the underlying GPT-5 model depending on the difficulty of the query — has become one of the chief complaints. In real world usage, Anthropic’s Claude Opus 4.1 seems to do a better job at “one-shotting” certain tasksOpenAI co-founder and CEO Sam Altman announced the company would restore access to GPT-4o and other old models for selected users, admitting the GPT-5 launch was “more bumpy than we hoped for.”

Updated: shortly after this post’s publication, OpenAI co-founder and CEO Sam Altman announced the company would restore access to GPT-4o and other old models for selected users, admitting the GPT-5 launch was “more bumpy than we hoped for.”

The launch of OpenAI’s long anticipated new model, GPT-5, is off to a rocky start to say the least.

 

Even forgiving errors in charts and voice demos during livestreamed presentation of the new model (actually four separate models, and a ‘Thinking’ mode that can be engaged for three of them), a number of user reports have emerged since GPT-5’s release showing it erring badly when solving relatively simple problems that preceding OpenAI models — and rivals from competing AI labs — answer correctly.

For example, data scientist Colin Fraser posted screenshots showing GPT-5 getting a math proof wrong (whether 8.888 repeating is equal to 9 — it is of course, not).




It also failed on a simple algebra arithmetic problem that elementary schoolers could probably nail, 5.9 = x + 5.11.


Using GPT-5 to judge OpenAI’s own erroneous presentation charts also did not yield helpful or correct responses.

 

The older 4o model performed better for me on at least one of these math problems. Unfortunately, OpenAI is slowly deprecating those older models — including the former default GPT-4o and the powerful reasoning model o3 — for users of ChatGPT, though they’ll continue to be available in the application programming interface (API) for developers for the foreseeable future.

Not as good at coding as benchmarks indicate

Even though OpenAI’s internal benchmarks and some third-party external ones have shown GPT-5 to outperform all other models at coding, it appears that in real world usage, Anthropic’s recently updated Claude Opus 4.1 seems to do a better job at “one-shotting” certain tasks, that is, completing the user’s desired application or software build to their specifications. See an example below from developer Justin Sun posted to X :

In addition, a report from security firm SPLX found that OpenAI’s internal safety layer left major gaps in areas like business alignment and vulnerability to prompt injection and obfuscated logic attacks. 

While anecdotal, the checking the temperature on how the model is faring with early AI adopters seems to indicate a chilly reception.

AI influencer and former Googler Bilawal Sidhu posted a poll on X asking for a “vibe check” from his followers and the wider userbase, and so far, with 172 votes in, the overwhelming response is “Kinda mid.”

And as the pseudonymous AI Leaks and News account wrote“The overwhelming consensus on GPT-5 from both X and the Reddit AMA are overwhelmingly negative.”

Tibor Blaho, lead engineer at AIPRM and a popular AI leaks and news poster on X, summarized the many problems with the ChatGPT-5 rollout in an excellent post, highlighting that one of the new marquee features — an automatic “router” in ChatGPT that chooses a thinking or non-thinking mode for the underlying GPT-5 model depending on the difficulty of the query — has become one of the chief complaints, given the model seemed to default to non-thinking mode for many users.

Competition waiting in the wings

Thus, the sentiment toward ChatGPT-5 is far from universally positive, highlighting a major problem for OpenAI as it faces increasing competition from major U.S. rivals like Google and Anthropic, and a growing list of free, open source and powerful Chinese LLMs offering features that many U.S. models lack.

Take the Alibaba Qwen Team of AI researchers, who just today updated their highly performant Qwen 3 model to have 1 million token context — giving users the ability to exchange nearly 4x as much information with the model in a single back/forth interaction as GPT-5 offers.

Given OpenAI’s other big release this week — that of new open source gpt-oss models — also received a mixed reception from early users, things are not looking up for the number one dedicated AI company by users right now (700 million weekly active users of ChatGPT as of this month).

Indeed, this is also exemplified by users of the betting marketplace Polymarket overwhelmingly deciding following the release of GPT-5 that Google would likely have the best AI model by the end of this month, August 2025.

Other power users like Otherside AI co-founder and CEO Matt Shumer, who received early access to GPT-5 and blogged about it favorably in a review hereopined that views would shift as more people figured out the best ways to use the new model and adjusted their integration approaches:

While it’s still early days for GPT-5 — and the sentiment could change dramatically as more users get their hands on it and try it for different tasks — the early indications are not looking like this is a “home run” release for OpenAI in the same way that prior releases such as GPT-4, or even the newer 4o and o3, were. And that’s a concerning indicator for a company that just raised yet another funding round, yet remains unprofitable due to its high costs of research and development.

1 comment:

  1. Bring your ideas to life with WebParadox — full-cycle web, mobile, and software development. From concept and design to deployment and support, our team delivers tailored solutions that match your business goals and drive growth.
    https://webparadox.com

    ReplyDelete