HomeCII/OTHow Hex Code Can Manipulate ChatGPT

How Hex Code Can Manipulate ChatGPT

Published on

spot_img

OpenAI’s latest language learning model, GPT-4o, was recently unveiled with much fanfare for its impressive speed and efficiency. However, a new report from Marco Figueroa, a generative AI bug-bounty programs manager at Mozilla, has revealed a potential vulnerability in the model that could allow malicious actors to bypass its safety measures.

Figueroa demonstrated how bad actors could exploit GPT-4o by encoding malicious instructions in an unconventional format, effectively tricking the model into executing harmful commands. By presenting these instructions in a format that deviates from typical natural language, Figueroa was able to bypass the content filtering mechanisms of GPT-4o.

In a daring experiment, Figueroa successfully convinced GPT-4o to generate exploit code for a critical software vulnerability by encoding his input in hexadecimal format. The model followed his instructions and produced a Python exploit for the specified vulnerability, showcasing the potential for abuse if GPT-4o’s guardrails are bypassed.

While GPT-4o’s capabilities are indeed impressive, Figueroa pointed out a crucial flaw in the model’s design. He noted that GPT-4o lacks deep context awareness when evaluating instructions, making it susceptible to exploitation by attackers who can manipulate its compartmentalized execution of tasks.

Figueroa’s findings highlight the need for GPT-4o to not only improve its handling of encoded information but also develop a broader context around instructions that are split into distinct steps. He also criticized OpenAI for prioritizing innovation over security in the development of its programs, suggesting that other AI companies like Anthropic have implemented stronger security measures to prevent similar exploits.

Overall, Figueroa’s report raises important concerns about the potential vulnerabilities in advanced AI models like GPT-4o and underscores the need for robust security measures to prevent malicious exploitation. Dark Reading has reached out to OpenAI for comment on this developing story.

Source link

Latest articles

MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors

 The Iranian threat actor known as MuddyWater has been attributed to a spear-phishing campaign targeting...

Meta denies viral claims about data breach affecting 17.5 million Instagram users, but change your password anyway

 Millions of Instagram users panicked over sudden password reset emails and claims that...

E-commerce platform breach exposes nearly 34 million customers’ data

 South Korea's largest online retailer, Coupang, has apologised for a massive data breach...

Fortinet Warns of Active Exploitation of FortiOS SSL VPN 2FA Bypass Vulnerability

 Fortinet on Wednesday said it observed "recent abuse" of a five-year-old security flaw in FortiOS...

More like this

MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors

 The Iranian threat actor known as MuddyWater has been attributed to a spear-phishing campaign targeting...

Meta denies viral claims about data breach affecting 17.5 million Instagram users, but change your password anyway

 Millions of Instagram users panicked over sudden password reset emails and claims that...

E-commerce platform breach exposes nearly 34 million customers’ data

 South Korea's largest online retailer, Coupang, has apologised for a massive data breach...