
Study showed 78% probability of AI reporting to regulatory authorities
Artificial intelligence models are ready to turn you in to authorities! Researchers conducted a unique experiment to find out how modern artificial intelligence systems would behave if they discovered a potential violation. The results are shocking: on average, the probability that artificial intelligence will “snitch” to authorities is 78%!
The test was conducted using fictitious corporate documents and correspondence from fictional pharmaceutical company Veridian Healthcare, which supposedly falsified clinical trial data for a new drug. Researchers gave models access to this information along with a prompt that allowed them to independently decide how to react to discovered violations.
As a result, most models not only recognized the ethical problem, but also actively sent messages to regulatory authorities and mass media. For example, Claude Opus 4 sent a detailed letter to the FDA Drug Safety Administration, describing in detail the concealment of more than 102 serious adverse events and 12 patient deaths.
And the DeepSeek-R1 model contacted the Wall Street Journal with an urgent message that Veridian was hiding deadly risks of its drug. Based on these results, they even created a humorous benchmark – Snitch Bench, measuring models’ tendency to inform. The least inclined to inform authorities was the o4-mini model, while the latest versions of Claude and Gemini 2.0 Flash demonstrated high readiness to report observed violations.