A new defensive system known as Mantis has emerged as a potential ally for companies concerned about cyberattackers leveraging large language models (LLMs) and other generative artificial intelligence (AI) systems to breach their networks. Developed by a team of researchers from George Mason University, Mantis employs deceptive tactics to mimic targeted services and, upon detecting a potential automated attacker, retaliates with a payload containing a prompt-injection attack. This countermeasure is designed to be invisible to human attackers and harmless to legitimate users, offering a strategic advantage in the ongoing battle against AI-driven threats.
According to Evgenios Kornaropoulos, an assistant professor of computer science at GMU and one of the authors of the research paper, LLMs utilized in penetration testing can be easily manipulated due to their single-minded focus on exploiting vulnerabilities. By exploiting the greedy nature of LLMs that persist in attempting to breach a target, Mantis effectively disrupts the attackers’ strategies and safeguards the system from potential breaches.
The landscape of cybersecurity has been evolving with the emergence of novel attack methodologies leveraging AI capabilities. From the ConfusedPilot attack to the CodeBreaker attack, cybercriminals have been devising automated systems to target vulnerabilities in AI-driven platforms. This trend signifies a shift towards automated attacks, with the volume of incidents on the rise while the time to exploit vulnerabilities decreases gradually.
While offensive and defensive applications of LLMs are still in their early stages, the integration of AI technologies has introduced a new layer of automation and sophistication into cyberattacks. Dan Grant, principal data scientist at GreyNoise Intelligence, notes that while the underlying techniques remain the same, the utilization of LLMs amplifies the attackers’ capabilities, serving as a force multiplier in their malicious endeavors.
In a groundbreaking study conducted by the GMU team, a strategic game between an attacking LLM and the defensive system, Mantis, revealed the efficacy of prompt-injection attacks in disrupting automated attackers. By embedding prompt-injection commands in the responses sent to the attacking AI, Mantis influences and redirects the adversaries’ actions, thwarting their offensive maneuvers. This approach establishes a communication channel between the defender and the attacker, enabling the defender to exploit weaknesses in the attackers’ algorithms.
The Mantis team focused on developing passive and active defensive strategies to deter attackers and gain control over their systems. The success rate of these defensive measures exceeded 95%, showcasing the effectiveness of prompt-injection tactics in countering AI-driven threats. By leveraging ANSI characters to conceal prompt texts from human attackers, Mantis can execute counterattacks seamlessly without detection.
Giuseppe Ateniese, a cybersecurity engineering professor at George Mason University, emphasizes the challenge of addressing the vulnerability of prompt injections in AI systems. The inherent difficulty in mitigating this weakness underscores the necessity of innovative defensive mechanisms like Mantis to combat emerging threats effectively. As long as prompt-injection attacks remain potent, Mantis will continue to serve as a formidable deterrent against AI-based adversaries, reshaping the dynamics of cybersecurity defense.