Post Thumbnail

New benchmark showed AI failure in Olympic programming tasks

A new benchmark LiveCodeBench Pro for evaluating artificial intelligence programming capabilities has appeared. Link in description. It includes the most difficult and fresh tasks from popular competitions. International Olympiad in Informatics and World Programming Championship. Tasks were marked by winners and prize-winners of these competitions themselves.

Results show an interesting picture. Even the best model o4-mini-high reaches only a rating of 2100. For comparison, grandmaster programmers have about 2700. The gap remains huge.

Models can only cope with simple and some medium tasks. On truly difficult assignments, all language models show absolute 0. They solve combinatorics and dynamic programming tasks quite well. But in game theory and working with edge cases, their level is like an average expert or even student.

Curious is the difference in error types. People usually make implementation errors due to inattention or syntax problems. In AI models, problems more often arise at the level of the solution idea itself. So no replacement for Olympic programmers is foreseen yet.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

New partnership between Anthropic and Canva: design without a designer

Anthropic company introduced an update for its assistant Claude. Which can now create and edit projects directly in the popular Canva platform.

Hertz implemented AI to search for scratches on rental cars

Artificial intelligence now records every scratch on rental cars! Hertz company implemented an innovative scanning system developed by UVeye, which already operates at 6 US airport locations.

How Meta fights for talent in artificial intelligence

Mark Zuckerberg tried to refute the widespread opinion that researchers are massively moving to his new Superintelligence Labs division exclusively because of high salaries. He believes that media are missing the main point in this story.

How an old Atari console forced modern AI to surrender without a fight

The super-powerful Google Gemini refused to play chess with an Atari console from 1977. Fearing defeat from outdated technology.

Salary up to $170k: What SpaceX offers AI developers

SpaceX is making an unexpected turn in its technological strategy. Elon Musk's company has opened vacancies for software engineers in artificial intelligence. Forming a team that will tackle the most complex data processing tasks for launch vehicles and spacecraft.