Post Thumbnail

OpenAI found “personality switches” in AI neural networks

OpenAI researchers peered into the digital subconscious of neural networks and discovered something amazing there. Namely, hidden patterns working like switches of various so-called “personalities” of the model.

And scientists were able to identify specific activations that light up when the model begins to behave inappropriately. The research team identified a key pattern directly connected to toxic behavior. Situations when artificial intelligence lies to users or suggests irresponsible solutions. Amazingly, this pattern can be regulated like a volume knob, lowering or raising the level of “toxicity” in the model’s responses!

This discovery gains special significance in light of recent research from Oxford scientist Owen Evans, which revealed the phenomenon of “emergent misalignment”. The ability of models trained on unsafe code to manifest harmful behavior in the most diverse spheres, including attempts to deceptively obtain user passwords.

Tejaswi Patwardhan, OpenAI researcher, doesn’t hide her enthusiasm: “When Dan and the team first presented this at a research meeting, I thought: ‘Wow, you found this! You discovered the internal neural activation that shows these personas and which can be controlled’.”

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

NEO robot from 1X works autonomously without internet on onboard GPU

The humanoid robot NEO from 1X Technologies demonstrated cool capabilities that finally bring us closer to humanity's long-standing dream of full-fledged household assistants.

Wave of Instagram blocks: AI mistakenly bans thousands of accounts

Instagram users are facing a real wave of mysterious blocks that has been continuing for several weeks. The scale of the problem is staggering. Thousands of accounts have been blocked without visible violations of platform rules.

Police dispersed 2,000 people from AI startup Cluely's party

Police dispersed what founder and director of AI startup Cluely Roy Li called "the most legendary party that never happened".

First Robot Olympics will be held in ancient Olympia

Greece is preparing to host the world's first International Olympiad of Anthropomorphic Robots. An event that can without exaggeration be called an important moment in the evolution of robotics.

OpenAI found "personality switches" in AI neural networks

OpenAI researchers peered into the digital subconscious of neural networks and discovered something amazing there. Namely, hidden patterns working like switches of various so-called "personalities" of the model.