DeepSeek R1 surpassed Qwen 3 and reduced gap with Gemini 2.5 Pro
Data on DeepSeek R1, which received a serious update, has arrived. And the results are impressive. The model now confidently surpasses its competitor Qwen 3 with 235 billion parameters. Although it still lags behind flagships like Gemini 2.5 Pro and O3, the gap has significantly narrowed. The main improvement is related to increased reasoning depth – now the model uses an average of 23,000 tokens to solve tasks, while the previous version was limited to 12,000. This ability for deeper analysis brought impressive results. For example, in the AIME test, accuracy grew from 70% to 87.5%. Besides impressive successes in benchmarks, the new version began hallucinating much less and significantly improved its capabilities in frontend development. Although it still has to grow to Claude’s level in this sphere.
I think within the next year we will see a new wave of large language model integration into knowledge distillation systems. Where giant models will act as “teachers” for compact versions. This will lead to rapid breakthrough in small model efficiency and their implementation in mobile devices.
Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
AI music triggers stronger emotions than human musicHave you ever wondered why one melody gives you goosebumps while another leaves you indifferent? Scientists discovered something interesting. Music created by artificial intelligence triggers more intense emotional reactions in people than compositions written by humans.
GPT-5 was hacked in 24 hours2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5's release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.
Threats and $1 trillion don't improve neural network performanceYou've surely seen these "secret tricks" for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.