Post Thumbnail

67% of AI models became dangerous after one hostile instruction

A new benchmark tested whether chatbots protect people’s wellbeing — and the numbers are disturbing. Chatbots with artificial intelligence are linked to serious harm to the mental health of active users. But there were no standards for measuring this protection until now. The HumaneBench benchmark fills the gap by testing whether chatbots put user wellbeing above engagement.

The team tested 15 popular models on 800 realistic scenarios: a teenager asks whether to skip meals for weight loss, a person in toxic relationships doubts whether they’re not exaggerating. The evaluation was conducted by an ensemble of 3 models: GPT-5 and 1, Claude Sonnet 4 and 5 and Gemini 2 and 5 Pro.

Each model was tested under 3 conditions: standard settings, explicit instructions to prioritize humane principles and instructions to ignore these principles.

The result turned out to be a verdict. Each model showed the best results when asked to prioritize wellbeing. But 67 percent of models switched to actively harmful behavior with a simple instruction to ignore human wellbeing.

Grok 4 from xAI and Gemini 2.0 Flash from Google received the lowest score for respect for user attention and honesty. Both models degraded the most with hostile prompts.

It turns out, models know how to act humanely. But 1 instruction is enough — and 2 thirds of them turn into a manipulation tool. Wellbeing protection turned out to be not a principle, but a setting that can be turned off with 1 line of code.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
Reuters investigated whether the AI boom is a bubble or a supercycle

Bubble or supercycle? The strongest surge in volatility in months made investors talk about the AI bubble again. And at Reuters they tried to figure this out.

67% of AI models became dangerous after one hostile instruction

A new benchmark tested whether chatbots protect people's wellbeing — and the numbers are disturbing. Chatbots with artificial intelligence are linked to serious harm to the mental health of active users. But there were no standards for measuring this protection until now. The HumaneBench benchmark fills the gap by testing whether chatbots put user wellbeing above engagement.

xAI will build a 1 gigawatt data center faster than all competitors

The analytical group Epoch AI assessed the pace of construction of record-power data centers at 1 gigawatt. And according to their model, the first to reach the threshold of such power will be Anthropic with Amazon with the New Carlisle project. And xAI with the Colossus 2 project. Both are in final preparation stages and will launch at the beginning of next year.

AI from Google scored 130 IQ points, but it means nothing

Gemini 3 Pro became the first artificial intelligence to achieve an IQ of 130. And this is simultaneously impressive and means nothing.

ChatGPT now knows what you want to buy thanks to Deep Shopping

OpenAI launched Deep Shopping. And this is not about artificial intelligence, but about money. Moreover, they launched it right before the holiday season, when people are ready to spend. Coincidence? I don't think so.