LLM-Generated Passwords Are Insecure; Your Codebase Might Confirm It

Temperature is Not a Remedy: Insights on Language Models and Password Security

A recent discourse within the field of artificial intelligence highlights a crucial misunderstanding among practitioners regarding the configuration of large language models (LLMs). Specifically, the notion that increasing the sampling temperature could mitigate distributional biases emerging from these models has come under scrutiny. A study titled "Irregular" reveals definitive evidence that contradicts this prevailing assumption.

The research indicates that when the sampling temperature is set to 1.0—this being the upper limit for Claude, one of the notable LLMs—there is no statistically significant improvement in effective entropy. This finding emphasizes that the biases related to character positioning are deeply embedded within the model’s weights rather than influenced by the sampling parameters. In essence, while modulating the temperature may appear to flatten the probability landscape from which characters are drawn, it does not address the root of these biases. Temperature adjustments occur downstream of the inherent distributions encoded in the model’s architecture.

To complement these findings, Alexey Antonov, the Data Science Team Lead at Kaspersky, embarked on an extensive analysis involving the generation of 1,000 passwords by three different models: ChatGPT, Meta’s Llama, and DeepSeek. The results illuminated significant discrepancies in character frequency across these models. For instance, ChatGPT demonstrated a marked preference for the characters ‘x,’ ‘p,’ and ‘L,’ while Llama favored the hash symbol and the letter ‘p.’ Similarly, DeepSeek displayed a bias towards the characters ‘t’ and ‘w.’ Notably, at a temperature setting of 0.0, Claude consistently generated identical strings for each invocation, underscoring the structural nature of these biases.

Such findings foster a greater understanding of vulnerabilities inherent in LLM-generated content, particularly concerning password generation. The implications for practical security are profound. An adversary with knowledge of the specific LLM used to generate a particular credential does not need to resort to exhaustive brute-force techniques against a voluminous keyspace such as 94^16. Instead, they can curate a model-specific attack dictionary that prioritizes potential passwords based on their empirical generation frequency. Consequently, this targeted approach enables attackers to execute a probabilistically optimized search against a significantly smaller keyspace—one that is several orders of magnitude less complex than traditional brute-forcing methods.

Kaspersky’s cracking tests further substantiate these concerns. The results revealed that an alarming 88 percent of passwords generated by DeepSeek and 87 percent from Llama were unable to withstand targeted attacks. Equally concerning, 33 percent of passwords produced by ChatGPT also proved vulnerable to similar assaults, all conducted using standard GPU hardware. These statistics highlight the urgent need for heightened awareness and reevaluation of security protocols concerning LLM-generated content.

As the capabilities of LLMs continue to expand, understanding their inherent biases and vulnerabilities becomes increasingly essential. Security professionals and practitioners alike must recognize that simply modifying sampling parameters, such as temperature, will not rectify the underlying issues tied to these language models. Instead, a comprehensive approach that involves scrutinizing the structural biases encoded within the models is critical.

In conclusion, the findings serve as a stark reminder of the limitations of current methodologies utilized in AI-generated content and password management. The propensity for LLMs to generate predictable and biased outputs presents an ongoing challenge in the realms of cybersecurity and digital authentication. Moving forward, both developers and security experts must work collaboratively to devise strategies that address these vulnerabilities, thereby fostering a safer digital landscape for users worldwide. The dialogue surrounding these issues is more important now than ever, as the popularity and application of LLMs burgeon across various sectors.

Source link

Select a plan

Monthly plan

Yearly plan

All plans include

Search for an article

LLM-Generated Passwords Are Insecure; Your Codebase Might Confirm It

Temperature is Not a Remedy: Insights on Language Models and Password Security

Latest articles

CyberASAP Gains £10m Funding as UK’s Emerging Cyber Innovators Shine

Mythos and AI Tools Increase Cybersecurity Risks in Healthcare

Governance Gaps Surface with 76% Rise in NHIs Driven by AI Agents

Iran-Linked Hackers Expected to Persist

More like this

CyberASAP Gains £10m Funding as UK’s Emerging Cyber Innovators Shine

Mythos and AI Tools Increase Cybersecurity Risks in Healthcare

Governance Gaps Surface with 76% Rise in NHIs Driven by AI Agents