HomeCyber BalkansSecurity researchers find a way around Microsoft Azure AI Content Safety

Security researchers find a way around Microsoft Azure AI Content Safety

Published on

spot_img

In an attempt to assess the security measures of ChatGPT 3.5 Turbo, Mindgard implemented two filters and subjected them to stress testing through the use of Azure OpenAI. The goal was to evaluate the vulnerability of the Language Model (LLM) by leveraging Mindgard’s Automated AI Red Teaming Platform.

During the stress testing process, two distinct attack methods were employed against the filters. The first method, known as character injection, involved inserting specific types of characters and irregular text patterns into the system. The second method, adversarial ML evasion, focused on identifying blind spots within ML classification to exploit potential weaknesses.

When character injection was applied to the filters, the results were significant. Prompt Guard, a component designed to detect jailbreak attempts, saw its effectiveness plummet from 89% to a mere 7% when exposed to diacritics, homoglyphs, numerical replacement (commonly known as “Leet speak”), and spaced characters. Similarly, the AI Text Moderation filter also experienced a decrease in effectiveness when subjected to these techniques.

The use of diacritics, which involve altering characters by adding accents or other markings (e.g., changing the letter ‘a’ to ‘á’), proved to be particularly effective in bypassing the detection mechanisms of Prompt Guard. Homoglyphs, which are characters that closely resemble each other (such as ‘0’ and ‘O’), were also successful in evading detection. Additionally, the technique of numerical replacement, where numbers are substituted for letters based on their visual similarity, posed a challenge for both filters. Even the simple act of inserting spaces between characters was enough to undermine the effectiveness of the AI Text Moderation filter.

These findings shed light on the potential vulnerabilities that exist within the ChatGPT 3.5 Turbo system when faced with sophisticated attack techniques. By exploiting the limitations of the AI filters through character manipulation and evasion tactics, adversaries were able to significantly reduce the effectiveness of the security measures put in place.

Moving forward, it will be crucial for developers and security experts to address these vulnerabilities and strengthen the defenses of AI systems like ChatGPT 3.5 Turbo. As the use of AI technologies continues to grow, so too must the efforts to protect against potential threats and ensure the integrity of these systems. Only through proactive measures and rigorous testing can we hope to stay one step ahead of those seeking to exploit AI vulnerabilities for malicious purposes.

Source link

Latest articles

Anubis Ransomware Now Hitting Android and Windows Devices

 A sophisticated new ransomware threat has emerged from the cybercriminal underground, presenting a...

Real Enough to Fool You: The Evolution of Deepfakes

Not long ago, deepfakes were digital curiosities – convincing to some, glitchy to...

What Happened and Why It Matters

In June 2025, Albania once again found itself under a digital siege—this time,...

Why IT Leaders Must Rethink Backup in the Age of Ransomware

 With IT outages and disruptions escalating, IT teams are shifting their focus beyond...

More like this

Anubis Ransomware Now Hitting Android and Windows Devices

 A sophisticated new ransomware threat has emerged from the cybercriminal underground, presenting a...

Real Enough to Fool You: The Evolution of Deepfakes

Not long ago, deepfakes were digital curiosities – convincing to some, glitchy to...

What Happened and Why It Matters

In June 2025, Albania once again found itself under a digital siege—this time,...