Post Thumbnail

OpenAI found “personality switches” in AI neural networks

OpenAI researchers peered into the digital subconscious of neural networks and discovered something amazing there. Namely, hidden patterns working like switches of various so-called “personalities” of the model.

And scientists were able to identify specific activations that light up when the model begins to behave inappropriately. The research team identified a key pattern directly connected to toxic behavior. Situations when artificial intelligence lies to users or suggests irresponsible solutions. Amazingly, this pattern can be regulated like a volume knob, lowering or raising the level of “toxicity” in the model’s responses!

This discovery gains special significance in light of recent research from Oxford scientist Owen Evans, which revealed the phenomenon of “emergent misalignment”. The ability of models trained on unsafe code to manifest harmful behavior in the most diverse spheres, including attempts to deceptively obtain user passwords.

Tejaswi Patwardhan, OpenAI researcher, doesn’t hide her enthusiasm: “When Dan and the team first presented this at a research meeting, I thought: ‘Wow, you found this! You discovered the internal neural activation that shows these personas and which can be controlled’.”

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Samsung seeks replacement for Google Gemini for Galaxy S26

Samsung Electronics, one of the leading mobile device manufacturers, is actively seeking alternatives to Google Gemini for its future Galaxy S26 lineup. The company is conducting negotiations with OpenAI and Perplexity, striving to expand the artificial intelligence ecosystem in its devices.

How language models transfer knowledge through random numbers

Have you ever wondered if numbers can store knowledge? Scientists discovered an amazing phenomenon. Language models can transfer their behavioral traits through sequences of digits that look like random noise.

Alibaba introduced Quark AI smart glasses with Snapdragon AR1 chip

Chinese tech giant Alibaba introduced its first model of Quark AI smart glasses at the World Conference on Artificial Intelligence in Shanghai.

Why advanced AI models confuse themselves during long reasoning

You give a complex task to a smart person and expect that the longer they think, the more accurate the answer will be. Logical, right? That's exactly how we're used to thinking about artificial intelligence work too. But new research from Anthropic shows that reality is much more interesting.

Z.AI introduced GLM-4.5 with 355 billion parameters and open source

Meet the new technological heavyweight! Z.AI company introduced the open language model GLM-4.5, which is ready to challenge Western giants not only with capabilities but also with accessibility.