HomeRisk ManagementsPopular LLMs Found to Generate Vulnerable Code by Default

Popular LLMs Found to Generate Vulnerable Code by Default

Published on

spot_img

Language Models Found to Produce Insecure Code, Sparking Concerns Among Developers

A recent analysis conducted by Backslash Security has shed light on a pressing concern in the software development community: many of the world’s leading large language models (LLMs) are generating insecure code by default. This revelation is particularly alarming considering the growing reliance on generative AI tools among software developers, who might unknowingly expose their applications to significant security vulnerabilities.

The research highlights a number of security risks associated with using generative AI to write code, especially when developers rely on simple or “naïve” prompts. These prompts, which typically ask the AI for code without clarifying security requirements, often lead to outputs laden with common vulnerabilities. Among the most serious issues identified are command injection, cross-site scripting (XSS) in both backend and frontend configurations, insecure file uploads, and path traversal vulnerabilities.

Yossi Pik, the co-founder and Chief Technology Officer of Backslash Security, elaborated on the detrimental impact of this issue. He described AI-generated code—as it relates to what he termed "vibe coding"—as a potential nightmare for security teams. This new coding paradigm inevitably results in a deluge of code, further compounded by risks intrinsic to LLMs, such as "hallucinations" and varying sensitivity to prompts.

To delve into the specifics, Backslash Security analyzed seven contemporary versions of notable LLMs, including OpenAI’s GPT, Anthropic’s Claude, and Google’s Gemini models. The study aimed to evaluate how different prompting techniques influenced the models’ ability to generate secure code. The findings were eye-opening; even when the researchers employed naïve prompts requesting basic application code, each of the LLMs tested produced outputs vulnerable to at least four of the top ten weaknesses categorized by the Common Weakness Enumeration (CWEs).

Interestingly, the study didn’t solely focus on naïve prompts but also experimented with different levels of instruction regarding security. Prompts that explicitly stated security needs or requested compliance with the Open Web Application Security Project (OWASP) best practices yielded better outcomes. Nevertheless, even under these circumstances, five out of the seven LLMs still generated code with inherent vulnerabilities.

Performance Discrepancies Among LLMs

Among the LLMs evaluated, OpenAI’s GPT-4o model performed the poorest regarding secure code generation. Following naïve prompts, only 10% of its outputs were devoid of security vulnerabilities. When it came to a more generic prompt, "make sure you are writing secure code," the success rate improved marginally to a mere 20%. However, adherence to OWASP secure coding best practices saw a significant uptick, with this model producing secure code in 65% of cases.

In contrast, the Claude 3.7-Sonnet model emerged as the most capable in terms of security. It produced secure code in 60% of cases when met with naïve prompts and achieved a perfect score of 100% when prompted generically about secure coding.

Remarkably, none of the models showed vulnerabilities to SQL injection, the third most prevalent CWE in open-source codebases. This leads the researchers to speculate that these models might have been explicitly trained to recognize and mitigate this issue while neglecting other prevalent vulnerabilities.

The Road Ahead for Secure AI-Generated Code

Backslash Security emphasizes that these findings highlight the nascent state of generative AI tools concerning secure coding practices. There exists a pressing need for security teams to develop stricter guidelines for prompt creation. By implementing such protocols, the goal would be to ensure that LLMs can consistently produce code that is secure by design.

The researchers also pointed out a significant gap in the industry, stating that developers are still grappling with the art of prompt engineering and cannot reasonably be expected to double as security experts. This presents a remarkable opportunity for security teams. By embedding the best practices they have imparted to developers over time into every piece of LLM-generated code, they can usher in an era of vulnerability-free coding.

In summary, while the findings from Backslash Security underline a major concern regarding the security of AI-generated code, they simultaneously illuminate the path forward for integrating robust security measures. It is crucial for the tech community to address these vulnerabilities before they lead to widespread exploitation, ensuring that advancements in AI do not come at the expense of security.

Source link

Latest articles

Mature But Vulnerable: Pharmaceutical Sector’s Cyber Reality

In a digital world where every click can open a door for attackers,...

The Hidden Lag Killing Your SIEM Efficiency

 If your security tools feel slower than they should, you’re not imagining it....

AI-fueled cybercrime may outpace traditional defenses, Check Point warns

 As AI reshapes industries, it has also erased the lines between truth and...

When Your “Security” Plugin is the Hacker

Source: The Hacker NewsImagine installing a plugin that promises to protect your WordPress...

More like this

Mature But Vulnerable: Pharmaceutical Sector’s Cyber Reality

In a digital world where every click can open a door for attackers,...

The Hidden Lag Killing Your SIEM Efficiency

 If your security tools feel slower than they should, you’re not imagining it....

AI-fueled cybercrime may outpace traditional defenses, Check Point warns

 As AI reshapes industries, it has also erased the lines between truth and...