Post Thumbnail

Threats and $1 trillion don’t improve neural network performance

You’ve surely seen these “secret tricks” for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.

Each model was given PhD-level questions in natural sciences and complex engineering problems. To exclude random fluctuations, each query was repeated 25 times.

The results were interesting! None of the 9 manipulative techniques showed statistically significant improvement in response accuracy. Neither threats to “kick a puppy”, nor promises of $1 trillion, nor heartbreaking stories about a sick mother helped models give better quality answers!

Moreover, these “tricks” made results less stable. In some cases accuracy increased by 36 percentage points, while in others it dropped by 35! Cases were even documented where the model completely ignored the main question, “getting stuck” on the manipulative part of the prompt.

Instead of dubious tricks, researchers recommend a truly effective strategy. Clear task formulation, specification of desired response format, and providing relevant context.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Threats and $1 trillion don't improve neural network performance

You've surely seen these "secret tricks" for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.

Anthropic integrated Opus 4.1 into Claude Code and cloud platforms

Anthropic released Claude Opus 4.1. This isn't just another update, but a substantial improvement in coding capabilities and agent functionality. What's especially pleasing — the new version is integrated not only into the classic Claude interface, but also into the Claude Code tool. As well as available through API, Amazon Bedrock and Google Cloud Vertex AI.

OpenAI released first open source models in 6 years

OpenAI released the first open source models in the last 6 years! The promised release took place.

Samsung seeks replacement for Google Gemini for Galaxy S26

Samsung Electronics, one of the leading mobile device manufacturers, is actively seeking alternatives to Google Gemini for its future Galaxy S26 lineup. The company is conducting negotiations with OpenAI and Perplexity, striving to expand the artificial intelligence ecosystem in its devices.

How language models transfer knowledge through random numbers

Have you ever wondered if numbers can store knowledge? Scientists discovered an amazing phenomenon. Language models can transfer their behavioral traits through sequences of digits that look like random noise.