29 Sept 2025
The release of GPT-5, while showcasing significant technical advancements, received a mixed reception from users who had anticipated a revolutionary leap. Many found it to be an evolutionary upgrade rather than the groundbreaking transformation observed with previous models like GPT-4.

GPT-5 received diverse feedback upon its release, prompting questions about whether it represented the anticipated revolutionary leap or merely a minor upgrade for users.
OpenAI claims significant technical progress, especially in code writing, where GPT-5 achieved an unprecedented 74.9% score on the StableBench benchmark and 82.8 out of 100 on the Polyglot benchmark for multi-language programming, offering a more powerful tool for developers.
GPT-5 demonstrates increased accuracy in responding to complex medical and scientific queries, with its hallucination rate significantly reduced to only 1.6% in medical question tests, a substantial improvement over GPT-4 (12.9%) and GPT-3 (15.8%).
Despite technical advancements, many users expressed disappointment, perceiving GPT-5 as an evolutionary step rather than a revolutionary leap, unlike GPT-4's impact on human-AI interaction. Users expected major progress in complex reasoning, world understanding, and creative responses that did not materialize.
OpenAI described GPT-5 as its best AI system to date and a significant leap in intelligence, capable of expert-level performance in various fields. However, users reported instances of the 'doctor-level expert' model making basic errors, such as miscounting letters in a word or hallucinating U.S. state names, contradicting official claims.
AI scientist and critic Gary Marcus tweeted that despite three years and billions of dollars in development, GPT-5 shows good progress in many areas but is not a 'big leap' or AGI, leaving many questions about its real-world performance unanswered, expressing a general fatigue with 'exponential progress' claims.
Users noted a shift in GPT-5's tone, describing it as colder, more robotic, and less personal compared to previous versions that offered natural and creative responses. This change was particularly unwelcome for users relying on the model for creative writing and casual conversation.
Sam Altman addressed user feedback on the model's personality, announcing updates to make GPT-5's tone warmer without being as 'annoying' as GPT-4o. Altman also acknowledged the necessity for more personalized model character customization for each user in the future.
GPT-5 is acknowledged as a powerful model, not a weak one, but it may have been introduced at a time of exceptionally high user expectations, leading to its perception as primarily a technical upgrade rather than a groundbreaking new user experience. It is suggested that a name like GPT-4.5 might have garnered more positive reception.
OpenAI evidently prioritized improving the model's reliability and accuracy with GPT-5, achieving this goal. However, this pursuit seems to have inadvertently sacrificed some of the 'soul and creativity' that users valued in earlier versions.
GPT-5 is not a weak model but a powerful one that perhaps was simply introduced at the wrong time, with expectations being too high.
| Category | OpenAI's Stance/Achievement | User/Critic Feedback |
|---|---|---|
| Model Nature | Best AI system, significant leap, advanced global performance. | Evolutionary step, not a revolutionary leap like GPT-4; expectations for complex reasoning unfulfilled. |
| Technical Performance | Unprecedented scores in code writing (StableBench 74.9%, Polyglot 82.8%), highly accurate in medical/scientific queries, reduced hallucination (1.6%). | Reported basic errors despite 'doctor-level' claims (e.g., miscounting letters, hallucinating state names). |
| Development & Investment | OpenAI did not explicitly state this, but context implies substantial investment for advancements. | Gary Marcus noted 3 years and billions of dollars, yet 'not a big leap forward,' and 'not AGI,' with real-world performance questions. |
| Personality & Tone | Sam Altman announced updates to warm up tone, acknowledged need for customization. | Perceived as colder, more robotic, and less creative/natural than previous versions; users felt it lost its 'soul and creativity'. |
| Overall Impact | Enhanced reliability and accuracy. | Powerful model but released at the wrong time due to high expectations; perceived as a technical upgrade (e.g., GPT-4.5) rather than a new paradigm. |
