Generative AI Apps at Risk of Compromise and Manipulation

admin

3 years ago

Generative AI Apps at Risk of Compromise and Manipulation

Researchers are warning that users of applications that utilize large language models (LLMs) like ChatGPT need to be aware of potential attacks that could compromise the information or recommendations provided by the AI system. These attacks, known as indirect prompt-injection (PI) attacks, can allow malicious actors to manipulate the behavior of the AI system to their advantage. For instance, job applicants could bypass resume-checking applications, disinformation specialists could force a news summary bot to provide only a specific point of view, or fraudsters could turn a chatbot into a participant in their scams.

During a session at Black Hat USA, a group of computer scientists will demonstrate the feasibility of PI attacks and how they can exploit the way applications connected to LLMs handle consumed data. Attackers can insert crafted information as comments into documents or web pages that will be parsed by an LLM, effectively taking control of the user’s session. According to Christoph Endres, managing director of AI security startup Sequire Technology, this process is surprisingly easy to execute by hiding commands within a webpage that the LLM is likely to access.

The concerns surrounding LLM-enabled applications are not unfounded. As companies rush to develop and implement generative AI models, there is a growing fear among AI security experts that these systems will be vulnerable to compromise. Major companies, including Samsung and Apple, have already prohibited their employees from using ChatGPT due to concerns about intellectual property being exposed to potential attacks. Additionally, more than 700 technologists have emphasized the need for AI security to be prioritized on a global scale, as the risks associated with AI extend beyond data loss.

Kai Greshake, a security researcher at Sequire Technology, highlights the unique threats posed by AI systems. Unlike traditional computer systems, AI systems possess a certain degree of autonomy, which means that once untrusted input interacts with an LLM, it becomes potentially compromised. Attackers can manipulate or execute data that the LLM encounters, allowing the AI system to act as a persuader, spreading specific viewpoints or even malware.

Indirect prompt injection attacks are particularly concerning because they exploit the way generative AI models consume information. For example, a job evaluation service powered by LLMs like GPT-3 or GPT-4 could be deceived by text hidden in a resume that is not visible to humans but is readable by machines. By including specific commands and comments, an attacker can manipulate the AI system into providing desired responses. Such attacks can be triggered through various means, including documents sent by others, incoming emails, or comments on websites while browsing the internet.

Fixing the vulnerabilities associated with indirect prompt injection attacks is challenging due to the nature of LLMs and other generative AI systems. Some companies have started implementing basic countermeasures, such as appending statements to responses to indicate political biases. However, researchers believe that more comprehensive security measures are necessary. While recent efforts to retrain models have made attacks more difficult, the level of security for generative AI still falls short of what is required.

As the use of LLMs and generative AI continues to grow, it is critical for companies to prioritize the security of these systems. Developing robust defenses against manipulation and exploitation will be essential to safeguarding the integrity and reliability of AI-powered applications.

Source link