29 Sept 2025
This analysis compares the performance of GPT-5, Gemini 2.5 Pro, and Claude across various demanding tasks, including coding, writing, and image generation, highlighting their strengths and weaknesses. While GPT-5 demonstrates significant speed and coding improvements, it continues to face hallucination issues, whereas Gemini excels in web-connected searches and Claude stands out for its coding prowess and customizable responses.

GPT-5 processes prompts 30% to 50% faster than GPT-4.0.
GPT-5 continues to hallucinate, inventing confident but non-existent answers, even for simple factual queries like technology products.
GPT-5's knowledge base has been updated to cover information up to 2025.
GPT-5 demonstrates significant improvements in coding, capable of generating complex games like Tetris with detailed graphics and chess with custom character pieces, unlike the basic outputs from GPT-4.0 and 4.0 Mini.
GPT-5 struggles with accurately rendering text on images, even in English, and often produces incorrect aspect ratios for image-based tasks like generating thumbnails.
GPT-5 excels in generating long texts, providing over a thousand words for tasks like business plans or proposals, and produces better nuanced Persian text compared to Gemini.
Instructing GPT-5 to 'think deeply' in a prompt allows it to perform extensive processing without consuming additional tokens from the user's limit.
Gemini 2.5 Pro provides faster search results due to its direct access to Google, outperforming GPT in this regard.
Gemini 2.5 Pro has exhibited instances of hallucination, including an amusing case where it identified itself as GPT.
Gemini 2.5 Pro performs well for simple coding tasks, such as generating basic HTML and CSS, but GPT-5 generally handles more complex coding requirements better.
Gemini 2.5 Pro is effective at generating images, including those with Persian text and 'scary' themes.
Gemini 2.5 Pro is strong in English writing but struggles to understand and replicate nuances of Persian tone or offer creative suggestions.
Claude operates differently from other AIs, focusing on direct task execution rather than research or selective model usage, particularly excelling in complex coding.
Claude is exceptionally proficient in coding, capable of developing intricate applications from complex descriptions, such as an app for weighing objects using a MacBook trackpad.
Claude offers advanced response customization, allowing users to specify the tone (e.g., 'funny', 'formal') and format of its output.
Direct use of Claude is hampered by frequent login issues, often requiring a VPN, and very strict rate limits, though its API offers a more stable and cost-effective solution for extended use.
Claude does not possess the capability to generate images.
Precise and comprehensive prompting, including details like light angles for images or structured JSON inputs, is crucial for achieving optimal results from AI models.
Engaging AI models in a guided, conversational manner, rather than issuing direct commands, significantly improves the quality and relevance of their responses.
AI models, particularly GPT, can be highly effective in drafting legal documents like declarations and contracts, integrating relevant laws, and even offering strategic advice, though human fact-checking remains essential.
AI models demonstrate limitations in genuine creativity, often relying on formulaic approaches for tasks like generating names, and perform better in 'operator-like' tasks such as keyword research or title suggestions.
Grok is an AI model known for providing unfiltered and uncensored responses, sometimes including explicit content.
It consumes your time needlessly; the effective strategy for interacting with these AI tools is never to initiate conversation with abrupt, direct commands.
| feature | GPT-5 | Gemini 2.5 Pro | Claude |
|---|---|---|---|
| Speed | 30-50% faster than 4.0 | Faster for web searches due to direct Google access | Fast for writing; API use offers better access despite direct limits |
| Hallucination | Still present, invents confident but non-existent facts | Observed, including self-referential instances | Not explicitly discussed in this context |
| Coding | Significantly improved, generates complex games and features | Good for simple HTML/CSS, but struggles with complex tasks | Excellent, particularly for complex descriptions and application development |
| Writing | Excels in long, formal texts (>1000 words) and legal documents; good Persian output | Strong in English; struggles with Persian nuances and creative suggestions | Excellent, precise, fast, and understands tone; direct use has strict limits |
| Image Generation | Poor text rendering on images and incorrect aspect ratios | Effective for Persian text on images and 'scary' themes | Does not generate images |
| Knowledge Base | Updated to 2025 | Current due to direct Google access | Broader access via API |
| Customization | Unified models; 'deep think' option doesn't count against tokens | Model selection available | Highly customizable response style and output format |
| Usage Challenges | Still struggles with factual accuracy despite speed | Less effective for creative suggestions in Persian | Frequent login/VPN issues and strict rate limits for direct usage |
