Opus 4.5 became the first model to overcome 80% on SWE-Bench verified
Anthropic released Opus 4.5 and showed that corporations finally understood that the future is not in chatting, but in real work.
The new version of Opus showed advanced results in benchmarks for coding, tool usage and task solving. But the main thing — this is the 1st model in the world that overcame 80% on the respected test for programming SWE-Bench verified.
The most interesting thing — these are memory improvements for long contexts. “Improvements in overall long context quality are important, but context windows alone are not enough,” stated the head of product management Diana Na Penn. “Knowing the right details to remember is really important in addition to simply expanding the context window”.
These changes allowed launching the long-awaited “infinite chat” feature for paid users. Now the model will compress context memory without notifying the user when it reaches the limit.
According to reviews, the model is especially impressive on real software engineering tests. When you give it a complex bug in a multi-system architecture, it finds the solution itself.
Many improvements are aimed at agentic scenarios, when Opus manages a group of sub-agents based on Haiku. “This is where fundamentals like memory become really important,” Penn explains. “Because Claude needs to explore code bases and large documents, as well as know when to go back and recheck something”.
It turns out that Anthropic is betting not on conversation imitation, but on real working tools. A model that doesn’t just chat, but really helps with code and tables.