Post Thumbnail

AI “scheming”: when models pretend for hidden goals

OpenAI researchers collaborated with Apollo Research to develop methods for evaluating so-called “scheming.” Link in description.

“Scheming” in artificial intelligence means a model pretending to do something while secretly pursuing another goal. Scheming arises from models’ need to find compromises between competing objectives. This can be understood through human analogy with a trader who could earn more by breaking laws and covering tracks. Than by following laws. Or analogy with children who don’t want to do unpleasant tasks and avoid responsibility.

OpenAI researchers in collaboration with Apollo Research created evaluation environments simulating future scenarios to study this phenomenon. By defining covert actions as intentional concealment or distortion of information, they discovered problematic behavior in all modern models.

Researchers jointly developed a method for reducing scheming in models and established protocols for stress-testing future methods. Then trained these same models, teaching them to analyze a general anti-scheming specification before acting. Which resulted in a 30-fold reduction in covert actions.

Researchers also found no evidence that modern models can suddenly begin exhibiting significant harmful behavior. However, as tasks with real consequences become more complex, potential risks will increase, requiring corresponding development of protective mechanisms.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
Worklop epidemic or how AI kills trust in you

You've surely encountered this. Letter from colleague that looks perfect: right structure, beautiful words, professional tone. You start reading — and understand that behind all this packaging there's absolutely nothing. No specifics, no solutions, just beautifully packaged emptiness. Congratulations: you just encountered workslop.

AI isn't smarter than people: A simple test will show everything

Artificial intelligence is smarter than most people. This thought comes to mind of almost everyone who regularly uses modern language models. And you know what? This thought is based on our perception error.

OpenAI DevDay 2025 Overview: Breakdown of All Announcements

OpenAI DevDay 2025 — important event in artificial intelligence world. And this is not just another presentation. I gathered all important facts, features, opinions for you and you'll learn everything most interesting that OpenAI CEO Sam Altman told.

Google DeepMind explores formation of parallel AI economy

Interesting concept of AI economy is presented in new Google DeepMind research. Link in description. Scientists analyzed rapidly forming reality. In which AI agents transform into independent economic players, capable of trading, negotiating and creating value without direct human participation. And if this process remains without proper control, autonomous systems may form their own parallel economy, closely connected to human one. Which carries both enormous opportunities and serious risks.

Oracle overtakes cloud giants thanks to bet on artificial intelligence

It turns out Oracle is demonstrating impressive growth, overtaking traditional cloud computing leaders. And masterfully using the AI wave to its advantage.