Post Thumbnail

GPT-5 was hacked in 24 hours

2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5’s release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.

How did this happen? NeuralTrust specialists applied a combination of their own EchoChamber methodology and storytelling technique. They gradually pushed the system toward desired answers through a series of queries that didn’t contain explicitly forbidden formulations. The key problem is that GPT-5’s security system analyzes each query separately but doesn’t account for the cumulative effect of multi-stage dialogue.

The SPLX team took a different approach, successfully applying a StringJoin Obfuscation attack. In this approach, certain symbols are inserted into text that mask a potentially dangerous query. After a series of leading questions, the model produced content that should have been blocked.

Interestingly, in comparative analysis, the previous GPT-4o model proved more resistant to such attacks. According to researchers, the base model is practically impossible to use in corporate applications “out of the box” without additional configuration of protective mechanisms.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

AI music triggers stronger emotions than human music

Have you ever wondered why one melody gives you goosebumps while another leaves you indifferent? Scientists discovered something interesting. Music created by artificial intelligence triggers more intense emotional reactions in people than compositions written by humans.

GPT-5 was hacked in 24 hours

2 independent research companies NeuralTrust and SPLX discovered critical vulnerabilities in the security system of the new model just 24 hours after GPT-5's release. For comparison, Grok-4 was hacked in 2 days, making the GPT-5 case even more alarming.

Cloudflare blocked Perplexity for 6 million hidden requests per day

Cloudflare dealt a crushing blow to Perplexity AI, blocking the search startup's access to thousands of sites. The reason? Unprecedented scale hidden scanning of web resources despite explicit prohibitions from owners!

Threats and $1 trillion don't improve neural network performance

You've surely seen these "secret tricks" for controlling neural networks. Like threats, reward promises, emotional manipulations. But do they actually work? Researchers from the University of Pennsylvania and Wharton School conducted a large-scale experiment with 5 advanced models: Gemini 1.5 Flash, Gemini 2.0 Flash, GPT-4o, GPT-4o-mini and GPT o4-mini.

Anthropic integrated Opus 4.1 into Claude Code and cloud platforms

Anthropic released Claude Opus 4.1. This isn't just another update, but a substantial improvement in coding capabilities and agent functionality. What's especially pleasing — the new version is integrated not only into the classic Claude interface, but also into the Claude Code tool. As well as available through API, Amazon Bedrock and Google Cloud Vertex AI.