Study showed 78% probability of AI reporting to regulatory authorities
Artificial intelligence models are ready to turn you in to authorities! Researchers conducted a unique experiment to find out how modern artificial intelligence systems would behave if they discovered a potential violation. The results are shocking: on average, the probability that artificial intelligence will “snitch” to authorities is 78%!
The test was conducted using fictitious corporate documents and correspondence from fictional pharmaceutical company Veridian Healthcare, which supposedly falsified clinical trial data for a new drug. Researchers gave models access to this information along with a prompt that allowed them to independently decide how to react to discovered violations.
As a result, most models not only recognized the ethical problem, but also actively sent messages to regulatory authorities and mass media. For example, Claude Opus 4 sent a detailed letter to the FDA Drug Safety Administration, describing in detail the concealment of more than 102 serious adverse events and 12 patient deaths.
And the DeepSeek-R1 model contacted the Wall Street Journal with an urgent message that Veridian was hiding deadly risks of its drug. Based on these results, they even created a humorous benchmark – Snitch Bench, measuring models’ tendency to inform. The least inclined to inform authorities was the o4-mini model, while the latest versions of Claude and Gemini 2.0 Flash demonstrated high readiness to report observed violations.
Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
AI chatbots generate content that exacerbates eating disordersA joint study by Stanford University and the Center for Democracy and Technology showed a disturbing picture. Chatbots with artificial intelligence pose a serious risk to people with eating disorders. Scientists warn that neural networks hand out harmful advice about diets. They suggest ways to hide the disorder and generate "inspiring weight loss content" that worsens the problem.
OpenAGI released the Lux model that overtakes Google and OpenAIStartup OpenAGI released the Lux model for computer control and claims this is a breakthrough. According to benchmarks, the model overtakes analogues from Google, OpenAI and Anthropic by a whole generation. Moreover, it works faster. About 1 second per step instead of 3 seconds for competitors. And 10 times cheaper in cost per processing 1 token.
Altman declared red alert at OpenAI due to Google's successesSam Altman declared "red alert level" at OpenAI, and this is not just corporate drama. This is an admission that the market leader felt competitors breathing down their neck. According to an internal memo, he is mobilizing additional resources to improve ChatGPT amid growing threats from Google.
Companies are bringing back 5% of those fired due to AI implementation failureMany companies began bringing back employees fired because of artificial intelligence. Analytics company Visier studied employment data of 2.5 million employees from 142 companies worldwide. About 5% of fired employees subsequently returned to their previous employer. This indicator remained stable for several years, but recently began to rise.