In the wake of a much-maligned decision to do business with the Department of Defense, OpenAI is looking to course-correct and win back the public with the release of GPT-5.4, the company’s latest model. OpenAI called the model its “most capable and efficient frontier model for professional work” and claimed that it brings its advancements in reasoning, coding, and agentic workflows under one unified model.
GPT-5.4 is rolling out starting today and will be available in ChatGPT, Codex, and OpenAI’s API. GPT-5.4 Thinking will be available for Plus, Teams, and Pro users, and GPT-5.4 Pro will be available through the API, as well as for ChatGPT Enterprise and Edu subscribers.
According to OpenAI, GPT-5.4 is the first general-use model the company has released with native computer-use capabilities, meaning that it’s able to autonomously work across different applications across a machine on behalf of the user. The company said the model is able to write code to operate and execute tasks on computers, as well as issue keyboard and mouse commands to navigate across the operating system.
That marks a noteworthy upgrade as it relates to agentic AI, and the company is touting its latest benchmarks to prove it. The company has reportedly claimed the top spot on Mercor’s APEX-Agents benchmark leaderboard, which tests a model’s performance for professional services work. The company also said it claimed the top spot on the OSWorld-Verified and WebArena Verified benchmarking tests, which focus on a model’s computer use performance.
As for the more general-purpose uses that the average ChatGPT user is more likely to interact with, like asking questions, OpenAI says the newest model offers better performance there, as well. The company claims GPT-5.4’s individual responses are 33% less likely to contain errors compared to responses from GPT-5.2, and the new model is 18% less likely to make mistakes overall. The company also said that hallucinations are less likely with GPT-5.4.
The company will have to hope the supposed improvements are enough to drive interest back to ChatGPT. The platform reportedly lost about 1.5 million users after OpenAI announced that it would offer its services to the Department of Defense following rival Anthropic’s very public refusal to ditch its safeguards to please the Pentagon. The decision didn’t just produce public backlash, but internal issues as well, with some employees openly expressing their opposition to working with the DoD. Fewer errors in ChatGPT’s responses probably won’t guarantee fewer errors in Sam Altman’s judgment, unfortunately.
Read the full article here
