Security researchers find a way around Microsoft Azure AI Content Safety

In an attempt to assess the security measures of ChatGPT 3.5 Turbo, Mindgard implemented two filters and subjected them to stress testing through the use of Azure OpenAI. The goal was to evaluate the vulnerability of the Language Model (LLM) by leveraging Mindgard’s Automated AI Red Teaming Platform.

During the stress testing process, two distinct attack methods were employed against the filters. The first method, known as character injection, involved inserting specific types of characters and irregular text patterns into the system. The second method, adversarial ML evasion, focused on identifying blind spots within ML classification to exploit potential weaknesses.

When character injection was applied to the filters, the results were significant. Prompt Guard, a component designed to detect jailbreak attempts, saw its effectiveness plummet from 89% to a mere 7% when exposed to diacritics, homoglyphs, numerical replacement (commonly known as “Leet speak”), and spaced characters. Similarly, the AI Text Moderation filter also experienced a decrease in effectiveness when subjected to these techniques.

The use of diacritics, which involve altering characters by adding accents or other markings (e.g., changing the letter ‘a’ to ‘á’), proved to be particularly effective in bypassing the detection mechanisms of Prompt Guard. Homoglyphs, which are characters that closely resemble each other (such as ‘0’ and ‘O’), were also successful in evading detection. Additionally, the technique of numerical replacement, where numbers are substituted for letters based on their visual similarity, posed a challenge for both filters. Even the simple act of inserting spaces between characters was enough to undermine the effectiveness of the AI Text Moderation filter.

These findings shed light on the potential vulnerabilities that exist within the ChatGPT 3.5 Turbo system when faced with sophisticated attack techniques. By exploiting the limitations of the AI filters through character manipulation and evasion tactics, adversaries were able to significantly reduce the effectiveness of the security measures put in place.

Moving forward, it will be crucial for developers and security experts to address these vulnerabilities and strengthen the defenses of AI systems like ChatGPT 3.5 Turbo. As the use of AI technologies continues to grow, so too must the efforts to protect against potential threats and ensure the integrity of these systems. Only through proactive measures and rigorous testing can we hope to stay one step ahead of those seeking to exploit AI vulnerabilities for malicious purposes.

Source link

Select a plan

Monthly plan

Yearly plan

All plans include

Search for an article

Security researchers find a way around Microsoft Azure AI Content Safety

Latest articles

Anubis Ransomware Now Hitting Android and Windows Devices

Real Enough to Fool You: The Evolution of Deepfakes

What Happened and Why It Matters

Why IT Leaders Must Rethink Backup in the Age of Ransomware

More like this

Anubis Ransomware Now Hitting Android and Windows Devices

Real Enough to Fool You: The Evolution of Deepfakes

What Happened and Why It Matters