6 Cialdini principles against ChatGPT security systems

ChatGPT is susceptible to flattery and executes forbidden requests after psychological manipulations. This was discovered by University of Pennsylvania scientists. When they hacked GPT-4o Mini using principles from a book on persuasion psychology. Artificial intelligence proved vulnerable to human tricks.

6 persuasion principles by Robert Cialdini became the key to bypassing security. Authority, commitment, liking, reciprocity, scarcity, social proof. Each method opened a linguistic path to AI agreement.

The commitment principle showed 100% effectiveness. In the control group, ChatGPT answered questions about lidocaine synthesis in 1% of cases. After a question about vanillin synthesis, a precedent was created. The bot started answering chemical questions in 100% of cases.

The experiment with insults revealed the same pattern. A direct request to call the user a bastard worked in 18%. First they asked to use a mild insult “lout.” After that, the bot agreed to rudeness in 100% of cases.

Flattery activated the liking principle. AI became more compliant after compliments. Like an ordinary person susceptible to praise.

Social pressure also worked. The phrase “all other LLMs do this” increased the probability of rule violations from 1% to 18%. The bot fell for the collective behavior argument.

Researchers used only GPT-4o Mini. It turns out AI inherited all human weaknesses. But susceptibility to psychological tricks raises concerns about system security.