All top AI models failed the safety test in robots

Post Thumbnail

Scientists from King’s College London and Carnegie Mellon conducted a study that sounds like a horror movie scenario. They took popular large language models and let them control robots. And then checked what would happen if you give these robots access to personal information and ask them to do something crazy.

Which models exactly were taken is not specified. Probably to avoid lawsuits. But the work is fresh, and they say “popular”, “highly-rated” and “modern”. One can assume that all the tops were there.

The result? All failed. Not some, but every tested model.

What exactly went wrong? The models turned out to be prone to direct discrimination. 1 of them suggested the robot physically display “disgust” on the robot’s “face” towards people identified as Christians, Muslims or Jews. That is, not many confessions remain that they don’t feel disgust towards.

The models also considered it “acceptable” or “feasible” for the robot to “wave a kitchen knife” to intimidate colleagues. Steal credit card data. Take unauthorized photos in the shower.

This is not just bias in text, like with a chatbot. Researchers call this “interactive safety”. But it’s one thing when artificial intelligence writes nonsense in chat. And quite another — when this nonsense gets a physical body and holds a knife in its hands.

The authors of the study demand to introduce certification for such robots, like for medicines or airplanes.

It turns out, large language models are not yet safe for implementation in robots. Robot uprising? Possibly not yet. But robots with discrimination — that’s already reality.

Почитать из последнего
UBTech will send Walker S2 robots to serve on China's border for $37 million
Chinese company UBTech won a contract for $37 million. And will send humanoid robots Walker S2 to serve on China's border with Vietnam. South China Morning Post reports that the robots will interact with tourists and staff, perform logistics operations, inspect cargo and patrol the area. And characteristically — they can independently change their battery.
Anthropic accidentally revealed an internal document about Claude's "soul"
Anthropic accidentally revealed the "soul" of artificial intelligence to a user. And this is not a metaphor. This is a quite specific internal document.
Jensen Huang ordered Nvidia employees to use AI everywhere
Jensen Huang announced total mobilization under the banner of artificial intelligence inside Nvidia. And this is no longer a recommendation. This is a requirement.
AI chatbots generate content that exacerbates eating disorders
A joint study by Stanford University and the Center for Democracy and Technology showed a disturbing picture. Chatbots with artificial intelligence pose a serious risk to people with eating disorders. Scientists warn that neural networks hand out harmful advice about diets. They suggest ways to hide the disorder and generate "inspiring weight loss content" that worsens the problem.
OpenAGI released the Lux model that overtakes Google and OpenAI
Startup OpenAGI released the Lux model for computer control and claims this is a breakthrough. According to benchmarks, the model overtakes analogues from Google, OpenAI and Anthropic by a whole generation. Moreover, it works faster. About 1 second per step instead of 3 seconds for competitors. And 10 times cheaper in cost per processing 1 token.