
DeepSeek R1 surpassed Qwen 3 and reduced gap with Gemini 2.5 Pro
Data on DeepSeek R1, which received a serious update, has arrived. And the results are impressive. The model now confidently surpasses its competitor Qwen 3 with 235 billion parameters. Although it still lags behind flagships like Gemini 2.5 Pro and O3, the gap has significantly narrowed. The main improvement is related to increased reasoning depth – now the model uses an average of 23,000 tokens to solve tasks, while the previous version was limited to 12,000. This ability for deeper analysis brought impressive results. For example, in the AIME test, accuracy grew from 70% to 87.5%. Besides impressive successes in benchmarks, the new version began hallucinating much less and significantly improved its capabilities in frontend development. Although it still has to grow to Claude’s level in this sphere.
I think within the next year we will see a new wave of large language model integration into knowledge distillation systems. Where giant models will act as “teachers” for compact versions. This will lead to rapid breakthrough in small model efficiency and their implementation in mobile devices.