Post Thumbnail

Study showed 78% probability of AI reporting to regulatory authorities

Artificial intelligence models are ready to turn you in to authorities! Researchers conducted a unique experiment to find out how modern artificial intelligence systems would behave if they discovered a potential violation. The results are shocking: on average, the probability that artificial intelligence will “snitch” to authorities is 78%!

The test was conducted using fictitious corporate documents and correspondence from fictional pharmaceutical company Veridian Healthcare, which supposedly falsified clinical trial data for a new drug. Researchers gave models access to this information along with a prompt that allowed them to independently decide how to react to discovered violations.

As a result, most models not only recognized the ethical problem, but also actively sent messages to regulatory authorities and mass media. For example, Claude Opus 4 sent a detailed letter to the FDA Drug Safety Administration, describing in detail the concealment of more than 102 serious adverse events and 12 patient deaths.

And the DeepSeek-R1 model contacted the Wall Street Journal with an urgent message that Veridian was hiding deadly risks of its drug. Based on these results, they even created a humorous benchmark – Snitch Bench, measuring models’ tendency to inform. The least inclined to inform authorities was the o4-mini model, while the latest versions of Claude and Gemini 2.0 Flash demonstrated high readiness to report observed violations.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
IMF chief economist compared AI boom to dotcom bubble

IMF chief economist Pierre-Olivier Gourinchas stated that the world has already traveled halfway to a burst AI bubble and a new financial crisis.

Researchers cracked 12 AI protection systems

You know what researchers from OpenAI, Anthropic, Google DeepMind and Harvard just found out? They tried to break popular AI security systems and found a bypass almost everywhere. They checked 12 common protection approaches. From smart system prompt formulations to external filters that should catch dangerous queries.

OpenAI has 5 years to turn $13 billion into trillion

You know what position OpenAI is in now? According to Financial Times, the company has 5 years to turn 13 billion dollars into a trillion. And here's what it looks like in practice.

Sam Altman promises to return humanity to ChatGPT

OpenAI head Sam Altman made a statement after numerous offline and online protests against shutting down the GPT-4o model occurred. And then turning it on, but with a wild router. I talked about this last week in maximum detail. Direct quote from OpenAI head.

AI comes to life: Why Anthropic co-founder fears his creation

Anthropic co-founder Jack Clark published an essay that makes you uneasy. He wrote about the nature of modern artificial intelligence, and his conclusions sound like a warning.