Post Thumbnail

New benchmark showed AI failure in Olympic programming tasks

A new benchmark LiveCodeBench Pro for evaluating artificial intelligence programming capabilities has appeared. Link in description. It includes the most difficult and fresh tasks from popular competitions. International Olympiad in Informatics and World Programming Championship. Tasks were marked by winners and prize-winners of these competitions themselves.

Results show an interesting picture. Even the best model o4-mini-high reaches only a rating of 2100. For comparison, grandmaster programmers have about 2700. The gap remains huge.

Models can only cope with simple and some medium tasks. On truly difficult assignments, all language models show absolute 0. They solve combinatorics and dynamic programming tasks quite well. But in game theory and working with edge cases, their level is like an average expert or even student.

Curious is the difference in error types. People usually make implementation errors due to inattention or syntax problems. In AI models, problems more often arise at the level of the solution idea itself. So no replacement for Olympic programmers is foreseen yet.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Samsung seeks replacement for Google Gemini for Galaxy S26

Samsung Electronics, one of the leading mobile device manufacturers, is actively seeking alternatives to Google Gemini for its future Galaxy S26 lineup. The company is conducting negotiations with OpenAI and Perplexity, striving to expand the artificial intelligence ecosystem in its devices.

How language models transfer knowledge through random numbers

Have you ever wondered if numbers can store knowledge? Scientists discovered an amazing phenomenon. Language models can transfer their behavioral traits through sequences of digits that look like random noise.

Alibaba introduced Quark AI smart glasses with Snapdragon AR1 chip

Chinese tech giant Alibaba introduced its first model of Quark AI smart glasses at the World Conference on Artificial Intelligence in Shanghai.

Why advanced AI models confuse themselves during long reasoning

You give a complex task to a smart person and expect that the longer they think, the more accurate the answer will be. Logical, right? That's exactly how we're used to thinking about artificial intelligence work too. But new research from Anthropic shows that reality is much more interesting.

Z.AI introduced GLM-4.5 with 355 billion parameters and open source

Meet the new technological heavyweight! Z.AI company introduced the open language model GLM-4.5, which is ready to challenge Western giants not only with capabilities but also with accessibility.