AIvengo > Reviews > OpenAI tests models against specialists from 44 professions

OpenAI tests models against specialists from 44 professions

OpenAI introduced new benchmark GDPval, which tests its AI models’ performance compared to professionals from various industries. And is an attempt to understand how close OpenAI systems are to surpassing humans in economically significant work.

The benchmark is based on 9 industries making largest contribution to US gross domestic product. GDPval tests AI model performance across 44 professions in these industries, from programmers to nurses and journalists. Experienced professionals compared AI-generated reports with works of other specialists.

GPT-5 high was rated better than or equal to industry experts in 46.6% of cases. Claude Opus 4.1 from Anthropic was rated better than or equal to industry experts in 49% of tasks. Although OpenAI claims Claude showed such high results due to tendency to create attractive graphics.

I think such high model scores might be inflated due to test limitations. And don’t reflect real performance. The new benchmark itself could create false expectations about AI capabilities in real work conditions.

Autor: AIvengo

For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

UBTech will send Walker S2 robots to serve on China's border for $37 million

Chinese company UBTech won a contract for $37 million. And will send humanoid robots Walker S2 to serve on China's border with Vietnam. South China Morning Post reports that the robots will interact with tourists and staff, perform logistics operations, inspect cargo and patrol the area. And characteristically — they can independently change their battery.

Anthropic accidentally revealed an internal document about Claude's "soul"

Anthropic accidentally revealed the "soul" of artificial intelligence to a user. And this is not a metaphor. This is a quite specific internal document.

Jensen Huang ordered Nvidia employees to use AI everywhere

Jensen Huang announced total mobilization under the banner of artificial intelligence inside Nvidia. And this is no longer a recommendation. This is a requirement.

AI chatbots generate content that exacerbates eating disorders

A joint study by Stanford University and the Center for Democracy and Technology showed a disturbing picture. Chatbots with artificial intelligence pose a serious risk to people with eating disorders. Scientists warn that neural networks hand out harmful advice about diets. They suggest ways to hide the disorder and generate "inspiring weight loss content" that worsens the problem.

OpenAGI released the Lux model that overtakes Google and OpenAI

Startup OpenAGI released the Lux model for computer control and claims this is a breakthrough. According to benchmarks, the model overtakes analogues from Google, OpenAI and Anthropic by a whole generation. Moreover, it works faster. About 1 second per step instead of 3 seconds for competitors. And 10 times cheaper in cost per processing 1 token.