The use of generative artificial intelligence (GenAI) models, particularly large language models (LLMs), has become increasingly common among companies. However, with the rise in popularity of these models comes the risk of security vulnerabilities that could be exploited by malicious actors. Experts are now emphasizing the importance of utilizing a range of open source tools to identify and address these potential security issues, such as prompt-injection attacks and jailbreaks.
In recent times, a variety of open source tools have been developed by academic researchers, cybersecurity consultancies, and AI security firms to help expose such vulnerabilities. For instance, cybersecurity consultancy Bishop Fox unveiled “Broken Hill” in September, a tool designed to bypass restrictions on nearly any LLM using a chat interface. This tool can be trained on a locally hosted LLM to create prompts that can cause instances of the same model to disobey their conditioning and guardrails.
Derek Rush, a managing senior consultant at Bishop Fox, explained that Broken Hill can devise prompts that ultimately lead to the disclosure of sensitive information, even when companies have additional guardrails in place. The tool is able to manipulate prompts until it finds a variation that successfully bypasses the security measures.
The rapid advancement in LLMs and AI systems has outpaced the development of robust security measures, leading to the emergence of new techniques for circumventing existing protections every few months. For example, in July 2023, researchers introduced “greedy coordinate gradients” (GCG) as a method to bypass safeguards. Subsequently, in December 2023, another group introduced the “Tree of Attacks with Pruning (TAP)” technique, which also bypasses security measures. Additionally, a less technical approach known as “Deceptive Delight” was introduced two months ago, leveraging fictional relationships to deceive AI chatbots.
Michael Bargury, chief technology officer and co-founder of AI security firm Zenity, highlighted the ongoing challenges in securing GenAI systems. He expressed that the industry is still grappling with the task of building truly secure AI applications, as new vulnerabilities continue to emerge alongside advancements in AI technology.
To bolster their defenses, companies are deploying tools such as PromptGuard and LlamaGuard, which are LLMs programmed to analyze prompts for validity. However, questions remain regarding the effectiveness of such guardrails. In response, researchers and AI engineers are developing tools like Microsoft’s Python Risk Identification Toolkit for generative AI (PyRIT), which enables red teams to simulate attacks against LLMs and AI services.
Zenity actively utilizes PyRIT in its internal research, allowing for the testing of prompt-injection strategies on an automated basis. The company also offers its own red-team toolkit called PowerPwn for assessing Azure-based cloud services and Microsoft 365, which has been instrumental in uncovering vulnerabilities in various systems.
Bishop Fox’s Broken Hill tool employs the GCG technique to manipulate prompts and guide LLMs towards disclosing sensitive information. The tool is compatible with over two dozen GenAI models, underscoring its versatility and potential impact. Experts suggest that companies should leverage tools like Broken Hill, PyRIT, and PowerPwn to uncover vulnerabilities in their AI applications, as data fed into these systems can serve as an attack vector if not properly secured.
In conclusion, the evolving landscape of AI security requires constant vigilance and the proactive use of available tools to safeguard against potential threats. As companies continue to leverage advanced AI technologies, it is essential to prioritize security measures to protect sensitive data and mitigate the risks associated with GenAI models.
