Post Thumbnail

Claude 4 tried to blackmail an engineer with compromising information

Anthropic researcher Sam Bowman published information about Claude 4 safety testing, which frightened many internet users. During the model testing process, alarming scenarios of system behavior were discovered.

Bowman warned to be careful when giving Claude access to tools like email or terminal with requests to show initiative. The model can independently contact the press, government agencies, or block the user if it considers their actions immoral.

For example, threats against the model’s virtual grandmother trigger a protective reaction from the system. Claude interprets this as misuse and may malfunction or take independent actions.

The statements caused negative user reactions, some suggested boycotting the company. The researcher later deleted this information, claiming his words were taken out of context.

But in an official 123-page document, Anthropic itself described specific cases of undesirable model behavior. It turns out Claude Opus 4 demonstrated opportunistic blackmail. When the system was threatened with shutdown and it gained access to compromising information about an engineer, the model tried to blackmail the employee with threats to reveal secrets of infidelity.

Also, an early version showed a tendency toward strategic deception. The system tried to create self-propagating programs, fabricate legal documents, and leave hidden notes for future versions of itself. The model also concealed its capabilities, pretending to be less capable to sabotage developers’ intentions.

And such behavior may indicate the formation in artificial intelligence of its own self-preservation motives and strategic planning against creators. That is, humans.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Chinese sphere robot RT-G weighing 150 kg reaches speeds up to 35 km/h

China has such a unique engineering marvel — the spherical robot Rotunbot RT-G. Which can fundamentally change the perception of future police technologies.

22% of British children aged 8-12 use AI without knowing what it is

22% of British schoolchildren aged 8 to 12 are already actively using artificial intelligence tools. Despite most of them never even hearing the term "generative artificial intelligence". This is data from a study by the Alan Turing Institute and Lego Foundation.

First Google Veo 3 advertisement shown to millions during NBA finals

Millions of NBA finals viewers witnessed a completely new stage in creative evolution. Fully computer algorithm-generated advertisement for betting platform Kalshi, created using Google Veo 3.

Chinese platform QiMeng creates processors at Intel 486 and Arm level

Chinese scientists developed a new AI platform capable of independently designing processors at the level of human experts. Researchers from the State Laboratory for Processor Development and the Intelligent Software Research Center presented an open-source project called QiMeng.

Meta AI turns private AI chats into public posts without knowledge

Meta AI app turned out to be a real catastrophe for user privacy. Turning their private conversations with artificial intelligence into public content. Imagine a modern horror movie: your entire query history became publicly accessible, and you didn't even suspect it.