GPT-5 was hacked in 24 hours
2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5’s release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.
How did this happen? NeuralTrust specialists applied a combination of their own EchoChamber methodology and storytelling technique. They gradually pushed the system toward desired answers through a series of queries that didn’t contain explicitly forbidden formulations. The key problem is that GPT-5’s security system analyzes each query separately but doesn’t account for the cumulative effect of multi-stage dialogue.
The SPLX team took a different approach, successfully applying a StringJoin Obfuscation attack. In this approach, certain symbols are inserted into text that mask a potentially dangerous query. After a series of leading questions, the model produced content that should have been blocked.
Interestingly, in comparative analysis, the previous GPT-4o model proved more resistant to such attacks. According to researchers, the base model is practically impossible to use in corporate applications “out of the box” without additional configuration of protective mechanisms.
Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
AI music triggers stronger emotions than human musicHave you ever wondered why one melody gives you goosebumps while another leaves you indifferent? Scientists discovered something interesting. Music created by artificial intelligence triggers more intense emotional reactions in people than compositions written by humans.
GPT-5 was hacked in 24 hours2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5's release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.
Threats and $1 trillion don't improve neural network performanceYou've surely seen these "secret tricks" for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.
Anthropic integrated Opus 4.1 into Claude Code and cloud platformsAnthropic released Claude Opus 4.1. This isn't just another update, but a substantial improvement in coding capabilities and agent functionality. What's especially pleasing — the new version is integrated not only into the classic Claude interface, but also into the Claude Code tool. As well as available through API, Amazon Bedrock and Google Cloud Vertex AI.