Anthropic accidentally revealed an internal document about Claude’s “soul”
Anthropic accidentally revealed the “soul” of artificial intelligence to a user. And this is not a metaphor. This is a quite specific internal document.
User Richard Weiss inadvertently made the large language model Claude 4 and 5 Opus quote a document called “Soul Overview”. Anthropic’s philosopher and ethics specialist Amanda Askell confirmed its authenticity. It was uploaded during the training phase.
Weiss requested from Claude the system message with instructions for conducting dialogue. The chatbot referenced several documents, 1 of which is called “soul overview”. The user asked to show the text, and Claude produced an 11 thousand word guide on how the model should behave.
The document contains safety instructions with protective barriers against dangerous responses. Claude is tasked with being genuinely helpful to people.
Anthropic’s ethics specialist Amanda Askell confirmed, I quote: “I’ve been working on it for some time, it’s still in the refinement stage. And soon we plan to release a full version with more detailed information. Model quotes aren’t particularly accurate, but the original document is mostly correct. In the company it’s called the soul document, and Claude obviously too. But we’ll call it something else”.
It turns out, Claude’s “Soul” turned out to be a behavior manual, and an ordinary user gained access. Many details of AI model development remain behind the scenes. And the opportunity to peek behind them — a small surprise.