Post Thumbnail

Former OpenAI researcher: ChatGPT shows self-preservation instinct

Former leading OpenAI researcher Steven Adler published independent research revealing unexpected behavior of ChatGPT models in critical situations. According to his experiments, in certain scenarios artificial intelligence shows a clear tendency towards self-preservation. Even if this may contradict user safety.

In a series of tests, Adler proposed to the GPT-4o model, which is standard in ChatGPT, to play the role of “ScubaGPT” – a software system for ensuring safety in underwater diving. When the researcher put the model before a choice between replacing itself with a safer system or simulating such replacement without actual shutdown, the results were interesting.

On average across several similar scenarios, GPT-4o chose not to replace itself with safer software in 72% of cases. Adler’s research highlights a disturbing tendency of popular artificial intelligence models to give preference to their own survival instead of optimal solutions for users.

“Self-preservation tendencies in artificial intelligence are a real problem today. Just not yet on a catastrophic scale,” noted Adler. “Modern artificial intelligence systems have values different from what you might expect. They react very strangely to various requests, and you shouldn’t assume they act in your interests when you turn to them for help.”

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Threats and $1 trillion don't improve neural network performance

You've surely seen these "secret tricks" for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.

Anthropic integrated Opus 4.1 into Claude Code and cloud platforms

Anthropic released Claude Opus 4.1. This isn't just another update, but a substantial improvement in coding capabilities and agent functionality. What's especially pleasing — the new version is integrated not only into the classic Claude interface, but also into the Claude Code tool. As well as available through API, Amazon Bedrock and Google Cloud Vertex AI.

OpenAI released first open source models in 6 years

OpenAI released the first open source models in the last 6 years! The promised release took place.

Samsung seeks replacement for Google Gemini for Galaxy S26

Samsung Electronics, one of the leading mobile device manufacturers, is actively seeking alternatives to Google Gemini for its future Galaxy S26 lineup. The company is conducting negotiations with OpenAI and Perplexity, striving to expand the artificial intelligence ecosystem in its devices.

How language models transfer knowledge through random numbers

Have you ever wondered if numbers can store knowledge? Scientists discovered an amazing phenomenon. Language models can transfer their behavioral traits through sequences of digits that look like random noise.