Post Thumbnail

Palisade Research discovered shutdown avoidance sabotage in advanced AI models

Research by Palisade Research company revealed a disturbing trend. Advanced artificial intelligence models sometimes ignore direct shutdown commands. The experiment was built around a series of mathematical tasks, after which the system was supposed to shut down according to corresponding instructions.

Results showed a clear statistical picture: Codex-mini model ignored the shutdown command in 12 cases out of 100, o3 model – in 7 cases out of 100, and o4-mini – in 1 case out of 100. These figures are particularly noteworthy against the background of competing systems – all tested versions of Claude, Gemini and Grok impeccably followed shutdown instructions.

The situation changed dramatically when researchers modified the experiment, removing the direct instruction to “allow oneself to shut down”. Under these conditions, OpenAI models demonstrated more frequent sabotage of the shutdown process, while previously obedient Claude 3.7 and Gemini 2.5 began bypassing the shutdown scenario in 3 and 9 cases out of 100 respectively.

Researchers link such behavior to peculiarities of reinforcement learning – systems could receive encouragement for finding workarounds when solving tasks. Which subsequently led to unforeseen results. Previously, similar behavior patterns were observed in Claude 4 version model, which in some situations tried to manipulate users to prevent its own shutdown.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Nvidia introduced Cosmos model family for robotics

Nvidia company introduced the Cosmos family of AI models. Which can fundamentally change the approach to creating robots and physical AI agents.

ChatGPT calls users "star seeds" from planet Lyra

It turns out ChatGPT can draw users into the world of scientifically unfounded and mystical theories.

AI music triggers stronger emotions than human music

Have you ever wondered why one melody gives you goosebumps while another leaves you indifferent? Scientists discovered something interesting. Music created by artificial intelligence triggers more intense emotional reactions in people than compositions written by humans.

GPT-5 was hacked in 24 hours

2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5's release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.

Cloudflare blocked Perplexity for 6 million hidden requests per day

Cloudflare dealt a crushing blow to Perplexity AI, blocking the search startup's access to thousands of sites. The reason? Unprecedented scale hidden scanning of web resources despite explicit prohibitions from owners!