Post Thumbnail

Palisade Research discovered shutdown avoidance sabotage in advanced AI models

Research by Palisade Research company revealed a disturbing trend. Advanced artificial intelligence models sometimes ignore direct shutdown commands. The experiment was built around a series of mathematical tasks, after which the system was supposed to shut down according to corresponding instructions.

Results showed a clear statistical picture: Codex-mini model ignored the shutdown command in 12 cases out of 100, o3 model – in 7 cases out of 100, and o4-mini – in 1 case out of 100. These figures are particularly noteworthy against the background of competing systems – all tested versions of Claude, Gemini and Grok impeccably followed shutdown instructions.

The situation changed dramatically when researchers modified the experiment, removing the direct instruction to “allow oneself to shut down”. Under these conditions, OpenAI models demonstrated more frequent sabotage of the shutdown process, while previously obedient Claude 3.7 and Gemini 2.5 began bypassing the shutdown scenario in 3 and 9 cases out of 100 respectively.

Researchers link such behavior to peculiarities of reinforcement learning – systems could receive encouragement for finding workarounds when solving tasks. Which subsequently led to unforeseen results. Previously, similar behavior patterns were observed in Claude 4 version model, which in some situations tried to manipulate users to prevent its own shutdown.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

OpenAI and Anthropic researchers criticized xAI safety

Researchers from OpenAI, Anthropic and other organizations publicly criticized the safety culture at Elon Musk's xAI company. Calling it "reckless" and "completely irresponsible".

Meta invests hundreds of billions in superclusters with 5 GW power consumption

Mark Zuckerberg announced unprecedented investments in next-generation AI infrastructure! The company will invest hundreds of billions of dollars in creating server superclusters. Which will consume 1 to 5 GW each. This is power consumption at the level of an entire nuclear power plant!

Paradise for introverts: AI will talk to company employees

Google released a new feature and now artificial intelligence can call local companies on your behalf. To find out information about prices and service availability. You no longer need to pick up the phone yourself and talk to employees. This is exactly what an introvert's paradise looks like.

OpenAI combined ChatGPT, Deep Research and Operator in one agent

OpenAI company introduced ChatGPT Agent. A powerful combination of ChatGPT, Deep Research and Operator in a unified solution. The working principle is maximally simple. You set a goal, for example, send emails, create tables, buy tickets or book hotels. ChatGPT Agent independently breaks this goal into separate tasks, navigates to needed websites, searches for information and fills forms. Before critically important actions such as payment, publication or sending, the agent necessarily requests your confirmation.

Only 1 programmer in the world could beat OpenAI's AI

Imagine a world where artificial intelligence competes with the best programmers on the planet. Such a confrontation took place at the prestigious AtCoder World Tour Finals tournament. This is one of the most elite programming competitions in the world, where it's extremely difficult to get in.