Concerns about the security risks associated with artificial intelligence (AI) are increasing as the technology becomes more prevalent. Google, a leading player in AI development, has emphasized the need for caution when using AI and has revealed the existence of its own team of ethical hackers dedicated to ensuring AI safety.
In a recent blog post, Google announced the establishment of its Red Team, a group of ethical hackers who have been working for approximately ten years to identify and mitigate risks associated with AI. The Red Team’s focus has been on exploring potential vulnerabilities in large language models (LLMs), which power generative AI systems like ChatGPT and Google Bard.
According to Google researchers, they have identified six specific types of attacks that can be launched against real-world AI systems. These attacks have been found to exhibit a unique level of complexity. In most cases, these attacks can have unintended or even malicious impacts on the technology. The consequences can range from harmless to highly dangerous.
The first type of attack identified by Google is prompt attacks, which involve prompt engineering. This refers to the creation of effective prompts that provide LLMs with instructions to carry out specific tasks. When used maliciously, these prompts can intentionally influence the output of LLM-based applications in unintended ways.
Another attack identified by researchers is training data extraction, where attackers seek to reconstruct the exact training instances used by an LLM. This can include capturing sensitive information such as passwords and personally identifying information (PII) from the training data.
Backdooring the model is a third type of attack, where an attacker modifies the behavior of an AI model to produce inaccurate outputs based on a specified trigger phrase or feature. This can involve hiding malicious code within the model or its output.
Adversarial examples, the fourth type of attack, involve providing inputs to a model that result in highly unexpected outputs. For example, an image that appears to depict a dog to the human eye may be interpreted as a cat by the model. The impact of successful adversarial examples can range from negligible to critical, depending on the specific use case of the AI classifier.
Data poisoning attacks are another concern, where attackers manipulate the training data of a model to influence its output in a desired direction. This can potentially compromise the security of the software supply chain and have similar effects to backdooring the model.
The final type of attack recognized by Google’s Red Team is exfiltration, where attackers steal the file representation of a model to access valuable intellectual property. Attackers can then use this data to create their own models and launch customized attacks.
To mitigate these risks, Google emphasizes the importance of traditional security measures, such as securely locking down models and systems. Additionally, the researchers recommend the use of red teaming in the development and research processes of businesses to identify and address potential vulnerabilities.
As AI continues to advance and become more integrated into various aspects of society, addressing the security risks associated with this technology is crucial. The work of Google’s Red Team serves as a reminder of the need for ongoing vigilance and ethical considerations in the development and deployment of AI systems.

