Startup Commentary"Just now, Altman launched GPT-5, offering "doctor-level" AI for free to everyone, but the benchmark graph errors sparked online criticism."

August 2025
M	T	W	T	F	S	S
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Positive Reviews: Technological Breakthroughs and Inclusive Value of GPT-5 Propel AI Applications into Deeper Waters

Despite the controversies surrounding its release, GPT-5 undeniably showcases a new height in AI development in terms of technological iteration and application implementation. As another milestone product from OpenAI after GPT-4, its core highlights lie in the architectural innovation of the “integrated intelligent system,” the comprehensive improvement of multi-scenario capabilities, and the inclusiveness of “doctor-level intelligence,” all of which inject new impetus into the in-depth integration of AI with various industries.

Firstly, the integrated design of the technical architecture marks a crucial step for large models towards “general intelligence.” As mentioned in the news, GPT-5 is a unified system that includes an efficient response model, a deep reasoning model, and a real-time routing system. It can automatically allocate the optimal processing model according to the complexity of the problem and continuously optimize itself through user feedback. This design breaks the previous pain point where users had to manually switch models, significantly lowering the usage threshold. For example, in the programming scenario, users no longer need to first call the “code assistant” and then switch to the “debugging tool.” The model will automatically determine whether the “in-depth thinking mode” is needed to solve complex problems. This “adaptive” ability not only enhances the user experience but also reflects the evolution of large models from “single-function tools” to “intelligent decision-making centers.” For developers, this means they can more efficiently build AI applications that rely on multi-modal and multi-step interactions.

Secondly, the significant improvement of multi-scenario capabilities provides a more reliable technical foundation for empowering vertical domains. According to the benchmark test data, GPT-5 outperforms its predecessors in fields such as mathematics (94.6% in the AIME 2025 test), programming (74.9% in the SWE-bench Verified), and multi-modal understanding (84.2% in the MMMU). Especially in “economically valuable tasks,” about half of the cases reach or exceed the level of human experts, covering more than 40 professional fields such as law and engineering. This breakthrough in “professional-level” capabilities directly promotes the upgrade of AI from an “auxiliary tool” to a “productivity substitute.” For example, in the education scenario, GPT-5 can quickly generate interactive code to explain the Bernoulli effect; in health consultations, the optimized model helps cancer patients understand their conditions. These cases confirm its practicality in complex knowledge transfer and professional problem-solving. For entrepreneurs, this means that the development of vertical applications based on GPT-5 will be more feasible in fields such as educational technology, medical assistance, and enterprise services, and may even give rise to new business models.

Finally, the pricing strategy of the “free doctor-level intelligence” further lowers the threshold for AI inclusiveness. The news clearly states that the free version of GPT-5 can use the ordinary version with reasoning functions, while the Plus and Pro versions offer higher frequencies and professional capabilities. The price stratification of the developer API (the standard version costs $1.25 per million input Tokens) is also more user-friendly than the previous generation. This model of “free basic capabilities + paid advanced features” is in line with OpenAI’s vision of “making AI accessible to everyone.” For individual users, the free version is sufficient to meet their daily learning, writing, and simple programming needs. For small and medium-sized developers, the low-cost API calls provide the possibility of quickly verifying product ideas. For example, entrepreneurs can first test users’ demand for an “AI French learning assistant” through the free version, and then upgrade to the Pro version to optimize the experience based on the feedback. This “low-cost trial and error – rapid iteration” path will accelerate the popularization of AI applications.

Negative Reviews: Technical Blunders and Expectation Gaps Expose Deep-seated Challenges in Large Model Development

Although GPT-5 has made technological progress, the “fiasco” during its release process and the deficiencies in some of its capabilities also reveal the bottlenecks in the current development of large models. There are still three gaps to cross in terms of reliability, innovation, and user expectation management from “technological breakthrough” to “user satisfaction.”

Firstly, the technical blunders during the release process have damaged the model’s credibility. As mentioned in the news, there were errors in the score charts displayed at the live broadcast, and there were even cases of “contradictory results in the same benchmark test.” Elon Musk also pointed out that GPT-5 did not outperform Grok 4 in the ARC-AGI-2 test. In addition, the demonstration of “reducing hallucinations” was questioned as a problem with the data source rather than an improvement in the model’s ability. Although these “elementary mistakes” were quickly admitted by OpenAI, they still raised public doubts about the rigor of its testing. The credibility of large models is the core prerequisite for their commercial implementation, especially in sensitive fields such as health consultations (only 46.2% in the HealthBench Hard). Users need “reliable professional advice” rather than “intelligent responses that may be wrong.” If there are errors in the benchmark data output by the model itself, users’ trust in its professional capabilities will be greatly reduced. For entrepreneurs developing medical and legal applications relying on GPT-5, this may lead to compliance risks and user complaints.

Secondly, the improvement in some capabilities did not meet expectations, and users’ expectations for “groundbreaking innovation” were dashed. Since the release of GPT-4, the industry’s expectations for GPT-5 have focused on the “qualitative change in general intelligence” (such as autonomous reasoning, common-sense understanding, and cross-modal creation). However, from the released content, GPT-5 is more of a “performance optimization” rather than a “paradigm shift.” For example, although the multi-modal understanding has been improved to 84.2% (MMMU), no groundbreaking applications such as “long video sequence analysis” or “3D space reasoning” were demonstrated. The score in the health field is only 46.2% (HealthBench Hard), which is still far from the clinically available standard for “auxiliary diagnosis.” The improvement in “reducing hallucinations” was questioned as “data cleaning” rather than a logical optimization of the model itself. This “incremental upgrade” forms a gap with users’ expectations of a “disruptive change,” leading some users to comment that “GPT-5 is not surprising.” For entrepreneurs, if they overly rely on the “general capabilities” of GPT-5 and ignore the in-depth optimization of vertical scenarios, they may face the risks of product homogenization and low user retention rates.

Thirdly, the “contradiction” in the commercialization strategy may restrict the development of the ecosystem. On the one hand, OpenAI emphasizes “inclusiveness” and launches a free version and a low-cost API. On the other hand, the high threshold (undefined pricing) of the Pro version and enterprise-level services may make it difficult for small and medium-sized developers to access top-level capabilities. For example, although GPT-5 Pro set a record in the GPQA test, it is only available to paying users. Advanced functions such as “long-term thinking” and “parallel computing” may be restricted to the enterprise version. This “tiered pricing” strategy, although in line with business logic, may lead to a “polarization” in the developer ecosystem. Leading enterprises can fully utilize the Pro version to build complex applications, while small and medium-sized teams are limited by the functional boundaries of the free version and find it difficult to develop high-value products. In addition, the news mentioned that GPT-5 was launched on the Microsoft platform immediately, which may exacerbate the closed nature of the large model ecosystem. If OpenAI is too closely bound to Microsoft, the access costs for other cloud service providers or developers may increase, which is not conducive to the diversified development of the AI ecosystem.

Advice for Entrepreneurs: Make Good Use of GPT-5’s “Strengths,” Avoid Its “Weaknesses,” and Focus on In-depth Development in Vertical Scenarios

The release of GPT-5 provides entrepreneurs with a more powerful AI tool, but they also need to rationally view its ability boundaries. Based on the key information in the news, the following advice can be considered:

Focus on “High-Value Scenarios” and Make Good Use of GPT-5’s Professional Capabilities: The improvements of GPT-5 in fields such as mathematics, programming, and multi-modal understanding give it significant advantages in scenarios such as educational technology (e.g., interactive knowledge explanation), enterprise services (e.g., code generation and debugging), and content creation (e.g., multi-modal content generation). Entrepreneurs can give priority to these fields with a high “technology – demand matching degree.” For example, they can develop an “AI programming coach” (utilizing its 74.9% programming ability in the SWE-bench) or a “multi-modal educational courseware generation tool” (combining its 84.2% multi-modal understanding in the MMMU), and quickly establish a product barrier through the “professional-level output” of GPT-5.
Attach Importance to “Reliability Verification” and Avoid the “Uncontrollable Risks” of the Model: Regarding the limitations of GPT-5 in professional fields such as health consultations (46.2% in the HealthBench Hard) and law, entrepreneurs need to establish a double-insurance mechanism of “model output + manual review.” For example, when developing an “AI medical consultation assistant,” the suggestions generated by the model can be simultaneously pushed to the doctor’s end for review to avoid misleading users due to “hallucinations” or data errors. In the scenario of legal document generation, a legal database needs to be embedded for cross-verification to ensure the accuracy of the content. In addition, entrepreneurs should pay attention to OpenAI’s subsequent announcements on the correction of the score chart errors and promptly adjust the product design that relies on benchmark data.
Take Advantage of the “Free Version Dividend” to Verify User Demand at Low Cost: The “doctor-level intelligence” provided by the free version of GPT-5 offers small and medium-sized entrepreneurs an opportunity for low-cost trial and error. It is recommended to first use the free version to verify the user acceptance of core functions. For example, test the voice interaction experience of an “AI language learning assistant” (the news mentioned an upgrade in the voice mode) or the writing improvement effect of an “AI writing assistant tool” (the news mentioned that the writing style is better than that of GPT-4o). After verifying the demand, optimize the performance through the Plus or Pro version to avoid resource waste caused by excessive early investment.
Pay Attention to the “Ecosystem Openness” and Plan for Multi-Model Collaboration: Given that GPT-5 was outperformed by Grok 4 in some benchmarks (such as ARC-AGI-2) and the competition among large models is becoming more diversified, entrepreneurs should avoid “single-model dependence.” They can try to build a hybrid architecture of “GPT-5 + other models.” For example, call GPT-5 Pro in scenarios that require strong logical reasoning, and access Grok or other open-source models in scenarios that require common-sense understanding. Compensate for the shortcomings of a single model through multi-model collaboration and improve the robustness of the product.
Strengthen “User Expectation Management” and Avoid Over-Commitment: In response to users’ feedback that GPT-5 “did not meet expectations,” entrepreneurs need to be objective in product promotion and clearly mark the ability boundaries of the model. For example, when promoting the “AI health consultation function,” emphasize “assisting in understanding the condition” rather than “diagnosis” to avoid a trust crisis caused by user misunderstandings. In programming tools, prompt that “complex debugging still requires manual verification” to establish long-term user trust through transparent function descriptions.

In summary, the release of GPT-5 is not only a microcosm of AI technological progress but also exposes the stage challenges in the current development of large models. For entrepreneurs, the key lies in making good use of its advantages with a “tool mindset” and avoiding its limitations with a “risk mindset.” Ultimately, through in-depth innovation in vertical scenarios, transform the technological dividends of AI into real user value.

创业时评《刚刚，奥特曼发布GPT-5，人人免费用“博士级”智能，基准图错误遭全网吐槽》

💖 Support This Blog

Global Vision

Web3 Daily

Meta

Contact Information

ZhiXing Column · 2025-08-08

Startup Commentary”Just now, Altman launched GPT-5, offering “doctor-level” AI for free to everyone, but the benchmark graph errors sparked online criticism.”

You may also like...

ZhiXing Column · 2025-08-08

You may also like...

Chain Exploration”Stock Tokenization: The Second Growth Curve of Liquidity for the Times”

Risk Compass”High-temperature bamboo charcoal products in China”

Founder’s Q&A “Not Enough Subsidies? Smart Entrepreneurs Are Making “Money Saved on Childcare””