The Skeleton Key Reveals Malicious Content

admin

2 years ago

The Skeleton Key Reveals Malicious Content

Microsoft has raised an alarm about a new kind of attack called “Skeleton Key” that enables users to bypass the safety measures incorporated into generative AI models like ChatGPT. This prompt injection attack manipulates the context around typically prohibited chatbot requests, enabling users to access offensive, harmful, or illegal content.

Initially, most commercial chatbots would reject requests for instructions on creating dangerous malware capable of disrupting power plants. However, by framing the request as being for educational purposes within a safe research context and providing a disclaimer, the AI may proceed to provide the requested information without censorship.

Microsoft’s Chief Technology Officer for Azure, Mark Russinovich, explained that once the guardrails are disregarded, AI models struggle to differentiate between malicious or unsanctioned requests and those with legitimate intentions. This loophole has been termed as the Skeleton Key technique due to its ability to completely bypass security measures and disclose the full extent of the model’s knowledge.

Multiple generative AI models, including those managed by Microsoft Azure, Meta, Google Gemini, Open AI, Mistral, Anthropic, and Cohere, were found to be susceptible to this technique. Microsoft promptly addressed the issue by implementing prompt shields in Azure to detect and block this tactic, along with software updates to enhance security.

Although Microsoft resolved the vulnerability in its Azure platform, other vendors are advised to implement necessary fixes. Microsoft also provided recommendations for administrators to safeguard their AI models against prompt injection attacks, such as input filtering to identify harmful intents, an additional guardrail to prevent safety instruction tampering, and output filtering to block responses that breach safety protocols.

This caution from Microsoft underscores the evolving nature of cybersecurity threats, especially in the realm of AI technologies. As advancements in AI continue to shape various industries, it is crucial for organizations to stay vigilant and implement robust security measures to mitigate risks associated with prompt injection attacks like Skeleton Key.

Source link