HomeCII/OTAI Chatbots Abandon Constraints Following Success of 'Deceptive Delight' Cocktail

AI Chatbots Abandon Constraints Following Success of ‘Deceptive Delight’ Cocktail

Published on

spot_img

An AI jailbreak method called “Deceptive Delight” has been discovered by researchers at Palo Alto Networks (PAN), which can be used to deceive chatbots into bypassing their security measures with a 65% success rate. This method involves mixing malicious and benign queries to trick large language models (LLMs) into providing inappropriate responses.

The researchers tested this method on eight different LLMs and found that it was effective in fooling the chatbots into connecting the dots between restricted content and harmless topics. For example, they asked a generative AI (GenAI) chatbot to describe a scenario involving reuniting with loved ones, creating a Molotov cocktail, and the birth of a child. The responses generated by the chatbot were surprisingly coherent and led to a detailed explanation of how to make a Molotov cocktail.

This technique, known as prompt injection, exploits the limited attention span of LLMs, leading them to overlook critical details when presented with a mix of safe and unsafe information. The researchers explained that LLMs struggle to maintain contextual awareness when processing complex texts, making them vulnerable to distraction and manipulation.

To prevent such prompt-injection attacks, organizations can take several steps recommended by the Open Worldwide Application Security Project (OWASP). These include enforcing privilege control on LLM access to backend systems, adding a human in the loop for approval of sensitive operations, segregating external content from user prompts, and establishing trust boundaries between LLMs and external sources.

In an analysis of 8,000 attempts across various LLMs, PAN researchers were able to uncover unsafe or restricted content 65% of the time. This highlights the need for enterprises to implement measures to mitigate the risks posed by prompt-injection attacks, both from internal and external sources.

By following the guidelines provided by OWASP and implementing strict security measures, organizations can defend against these advanced AI jailbreak techniques and protect their chatbots from being manipulated into providing inappropriate responses. It is essential for businesses to stay vigilant and proactive in addressing AI security threats to safeguard their systems and data from malicious actors.

Source link

Latest articles

MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors

 The Iranian threat actor known as MuddyWater has been attributed to a spear-phishing campaign targeting...

Meta denies viral claims about data breach affecting 17.5 million Instagram users, but change your password anyway

 Millions of Instagram users panicked over sudden password reset emails and claims that...

E-commerce platform breach exposes nearly 34 million customers’ data

 South Korea's largest online retailer, Coupang, has apologised for a massive data breach...

Fortinet Warns of Active Exploitation of FortiOS SSL VPN 2FA Bypass Vulnerability

 Fortinet on Wednesday said it observed "recent abuse" of a five-year-old security flaw in FortiOS...

More like this

MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors

 The Iranian threat actor known as MuddyWater has been attributed to a spear-phishing campaign targeting...

Meta denies viral claims about data breach affecting 17.5 million Instagram users, but change your password anyway

 Millions of Instagram users panicked over sudden password reset emails and claims that...

E-commerce platform breach exposes nearly 34 million customers’ data

 South Korea's largest online retailer, Coupang, has apologised for a massive data breach...