GPT-5: A Technical Leap or an Evolutionary Step?

By Amir Ramzali
Financial markets analyst and educator

29 Sept 2025

The release of GPT-5, while showcasing significant technical advancements, received a mixed reception from users who had anticipated a revolutionary leap. Many found it to be an evolutionary upgrade rather than the groundbreaking transformation observed with previous models like GPT-4.

Key Points Summary

GPT-5 Initial Reception
GPT-5 received diverse feedback upon its release, prompting questions about whether it represented the anticipated revolutionary leap or merely a minor upgrade for users.
Technical Advancements in Code Writing
OpenAI claims significant technical progress, especially in code writing, where GPT-5 achieved an unprecedented 74.9% score on the StableBench benchmark and 82.8 out of 100 on the Polyglot benchmark for multi-language programming, offering a more powerful tool for developers.
Improved Accuracy and Reduced Hallucination
GPT-5 demonstrates increased accuracy in responding to complex medical and scientific queries, with its hallucination rate significantly reduced to only 1.6% in medical question tests, a substantial improvement over GPT-4 (12.9%) and GPT-3 (15.8%).
User Disappointment Regarding Revolutionary Progress
Despite technical advancements, many users expressed disappointment, perceiving GPT-5 as an evolutionary step rather than a revolutionary leap, unlike GPT-4's impact on human-AI interaction. Users expected major progress in complex reasoning, world understanding, and creative responses that did not materialize.
OpenAI's Official Stance and User Perception Discrepancy
OpenAI described GPT-5 as its best AI system to date and a significant leap in intelligence, capable of expert-level performance in various fields. However, users reported instances of the 'doctor-level expert' model making basic errors, such as miscounting letters in a word or hallucinating U.S. state names, contradicting official claims.
Criticism from Gary Marcus
AI scientist and critic Gary Marcus tweeted that despite three years and billions of dollars in development, GPT-5 shows good progress in many areas but is not a 'big leap' or AGI, leaving many questions about its real-world performance unanswered, expressing a general fatigue with 'exponential progress' claims.
Change in Model's Tone and Personality
Users noted a shift in GPT-5's tone, describing it as colder, more robotic, and less personal compared to previous versions that offered natural and creative responses. This change was particularly unwelcome for users relying on the model for creative writing and casual conversation.
Sam Altman's Response to Personality Feedback
Sam Altman addressed user feedback on the model's personality, announcing updates to make GPT-5's tone warmer without being as 'annoying' as GPT-4o. Altman also acknowledged the necessity for more personalized model character customization for each user in the future.
Overall Assessment of GPT-5
GPT-5 is acknowledged as a powerful model, not a weak one, but it may have been introduced at a time of exceptionally high user expectations, leading to its perception as primarily a technical upgrade rather than a groundbreaking new user experience. It is suggested that a name like GPT-4.5 might have garnered more positive reception.
Trade-off Between Accuracy and Creativity
OpenAI evidently prioritized improving the model's reliability and accuracy with GPT-5, achieving this goal. However, this pursuit seems to have inadvertently sacrificed some of the 'soul and creativity' that users valued in earlier versions.

GPT-5 is not a weak model but a powerful one that perhaps was simply introduced at the wrong time, with expectations being too high.

Under Details

Category	OpenAI's Stance/Achievement	User/Critic Feedback
Model Nature	Best AI system, significant leap, advanced global performance.	Evolutionary step, not a revolutionary leap like GPT-4; expectations for complex reasoning unfulfilled.
Technical Performance	Unprecedented scores in code writing (StableBench 74.9%, Polyglot 82.8%), highly accurate in medical/scientific queries, reduced hallucination (1.6%).	Reported basic errors despite 'doctor-level' claims (e.g., miscounting letters, hallucinating state names).
Development & Investment	OpenAI did not explicitly state this, but context implies substantial investment for advancements.	Gary Marcus noted 3 years and billions of dollars, yet 'not a big leap forward,' and 'not AGI,' with real-world performance questions.
Personality & Tone	Sam Altman announced updates to warm up tone, acknowledged need for customization.	Perceived as colder, more robotic, and less creative/natural than previous versions; users felt it lost its 'soul and creativity'.
Overall Impact	Enhanced reliability and accuracy.	Powerful model but released at the wrong time due to high expectations; perceived as a technical upgrade (e.g., GPT-4.5) rather than a new paradigm.

Related Tags

LLM

Mixed

OpenAI

GPT-5

GPT-5: A Technical Leap or an Evolutionary Step?

Key Points Summary

Under Details

Tags

Share this post

Other Posts

Related Tags

GPT-5: A Technical Leap or an Evolutionary Step?

Key Points Summary

Under Details

Tags

Share this post

Other Posts

US Consumer Spending and Inflation Analysis from PCE Data

Trump's New Tariff Wave and Its Broad Economic Repercussions

Options Strategy for Market Management and Income Generation

Related Tags