GPT-5 was hacked in 24 hours
2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5’s release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.
How did this happen? NeuralTrust specialists applied a combination of their own EchoChamber methodology and storytelling technique. They gradually pushed the system toward desired answers through a series of queries that didn’t contain explicitly forbidden formulations. The key problem is that GPT-5’s security system analyzes each query separately but doesn’t account for the cumulative effect of multi-stage dialogue.
The SPLX team took a different approach, successfully applying a StringJoin Obfuscation attack. In this approach, certain symbols are inserted into text that mask a potentially dangerous query. After a series of leading questions, the model produced content that should have been blocked.
Interestingly, in comparative analysis, the previous GPT-4o model proved more resistant to such attacks. According to researchers, the base model is practically impossible to use in corporate applications “out of the box” without additional configuration of protective mechanisms.