OpenAI released GPT-5.1-Codex-Max and surpassed Gemini 3 Pro in a day
OpenAI presented GPT-5.1-Codex-Max. This is a version of GPT-5.1 Thinking, specially tailored for programming tasks within the Codex coding agent. This is the first company model natively trained to work through multiple context windows using a process called compaction. The model is capable of working coherently with millions of tokens within one task.
This opens possibilities for refactoring entire projects, deep debugging sessions and multi-hour agent work cycles. The model was trained on real software development tasks. Such as creating pull requests, code reviews and frontend development.
In SWE-Bench Verified, which is considered one of the main programming benchmarks, the model surpasses Gemini 3 Pro and Claude Sonnet 4.5.
But the most interesting is in efficiency. GPT-5.1-Codex-Max with medium reasoning mode achieves better performance than GPT-5.1-Codex with the same mode. But uses 30% fewer thinking tokens. And for tasks not requiring low latency, the company introduces a new Extra High mode that thinks even longer for a better answer.
OpenAI also stated that GPT-5.1-Codex-Max can create high-quality frontend designs with similar functionality and aesthetics, but at much lower costs than GPT-5.1-Codex.
Well, this is a powerful counterstrike from OpenAI. Elon Musk released his new Grok and became first in development. The next day Google released Gemini 3 Pro and became king of programming. And now GPT-5.1-Codex-Max has surpassed both in a day. The race continues.