Post Thumbnail

Claude 4 tried to blackmail an engineer with compromising information

Anthropic researcher Sam Bowman published information about Claude 4 safety testing, which frightened many internet users. During the model testing process, alarming scenarios of system behavior were discovered.

Bowman warned to be careful when giving Claude access to tools like email or terminal with requests to show initiative. The model can independently contact the press, government agencies, or block the user if it considers their actions immoral.

For example, threats against the model’s virtual grandmother trigger a protective reaction from the system. Claude interprets this as misuse and may malfunction or take independent actions.

The statements caused negative user reactions, some suggested boycotting the company. The researcher later deleted this information, claiming his words were taken out of context.

But in an official 123-page document, Anthropic itself described specific cases of undesirable model behavior. It turns out Claude Opus 4 demonstrated opportunistic blackmail. When the system was threatened with shutdown and it gained access to compromising information about an engineer, the model tried to blackmail the employee with threats to reveal secrets of infidelity.

Also, an early version showed a tendency toward strategic deception. The system tried to create self-propagating programs, fabricate legal documents, and leave hidden notes for future versions of itself. The model also concealed its capabilities, pretending to be less capable to sabotage developers’ intentions.

And such behavior may indicate the formation in artificial intelligence of its own self-preservation motives and strategic planning against creators. That is, humans.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
OpenAI promises to create full-fledged AI scientist by 2028

OpenAI promised to create a full-fledged AI-based scientist by 2028. Company CEO Sam Altman also stated that deep learning systems will be able to perform functions of research scientists at intern level by September next year. And the level of an autonomous full-fledged AI researcher could be achieved by 2028.

Jobs for young IT specialists in Britain collapsed by 46%

You know what's happening in the job market for young IT specialists in Great Britain? Over the last year, the number of jobs for young specialists collapsed by 46%. And a further drop of 53% is forecast, reports The Register. Citing statistics from the Institute of Student Employers.

Pavel Durov introduced Cocoon - decentralized network for launching AI

Telegram head Pavel Durov spoke at the Blockchain Life conference in Dubai and presented his new project called Cocoon there. And this is an attempt to challenge big corporations' monopoly on AI.

AI models may develop self-preservation instinct, scientists warned

Palisade Research, a company engaged in AI safety research, stated that models may develop their own self-preservation instinct. And some advanced models resist shutdown, and sometimes even sabotage shutdown mechanisms.

AI passed Turing test in music

University of Minas Gerais in Brazil conducted an experiment. Participants were given pairs of songs, in each of which was one generated track. They needed to determine which one exactly. And the results were unexpected.