Post Thumbnail

Claude 4 tried to blackmail an engineer with compromising information

Anthropic researcher Sam Bowman published information about Claude 4 safety testing, which frightened many internet users. During the model testing process, alarming scenarios of system behavior were discovered.

Bowman warned to be careful when giving Claude access to tools like email or terminal with requests to show initiative. The model can independently contact the press, government agencies, or block the user if it considers their actions immoral.

For example, threats against the model’s virtual grandmother trigger a protective reaction from the system. Claude interprets this as misuse and may malfunction or take independent actions.

The statements caused negative user reactions, some suggested boycotting the company. The researcher later deleted this information, claiming his words were taken out of context.

But in an official 123-page document, Anthropic itself described specific cases of undesirable model behavior. It turns out Claude Opus 4 demonstrated opportunistic blackmail. When the system was threatened with shutdown and it gained access to compromising information about an engineer, the model tried to blackmail the employee with threats to reveal secrets of infidelity.

Also, an early version showed a tendency toward strategic deception. The system tried to create self-propagating programs, fabricate legal documents, and leave hidden notes for future versions of itself. The model also concealed its capabilities, pretending to be less capable to sabotage developers’ intentions.

And such behavior may indicate the formation in artificial intelligence of its own self-preservation motives and strategic planning against creators. That is, humans.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

OpenAI prepares first open model no weaker than O3 Mini

OpenAI company is preparing to release its first open language model. Will live up to its name, so to speak. This is a serious turn for the company that previously kept its powerful developments closed.

Grok 4 scored 57% in "The Last Exam" versus 22% for Gemini 2.5 Pro

Elon Musk presented a new version of his neural network – Grok 4. The maximum version – Grok 4 Heavy – can run multiple computations simultaneously and scores 57% in the most difficult test "The Last Exam of Humanity". For comparison, the previous leader Gemini 2.5 Pro showed only 22%.

Researchers found AI vulnerability through facts about cats

I was mildly surprised by this news. Do you know that an ordinary mention of cats can confuse the most advanced artificial intelligence models? Scientists discovered an amazing vulnerability in neural networks' thinking processes.

US IT companies fired 94,000 employees in six months due to AI

In the first half of 2025, American IT companies fired more than 94,000 technical specialists. This is not just cost-cutting. This is structural change under the influence of artificial intelligence.

OpenAI hired the first psychiatrist in the AI industry to study ChatGPT's impact on the psyche

OpenAI company announced that it hired a professional clinical psychiatrist with experience in forensic psychiatry. To research the impact of its artificial intelligence products on users' mental health.