Infostealer Malware Discovered in Hugging Face Repository: A Rising Threat to AI Supply Chains
In an alarming revelation, security researchers have uncovered a covert infostealer malware concealed within a prominent repository on Hugging Face, a widely-used platform for AI models and tools. This incident underscores the growing risks associated with the AI supply chain, highlighting how even popular repositories can be compromised.
The AI security firm HiddenLayer brought this serious threat to light in a detailed blog post on May 7. They identified the repository named Open-OSS/privacy-filter as malicious, which not only attracted significant attention but also had garnered substantial user engagement. Within just 18 hours of its appearance as one of the most trending repositories, it racked up more than 244,000 downloads and 667 likes. HiddenLayer suggested that these numbers were "almost certainly artificially inflated," a tactic designed to enhance the repository’s legitimacy and lure unsuspecting users.
Further investigation revealed that the malicious repository employed typosquatting techniques, mimicking OpenAI’s authentic Privacy Filter release by replicating its model card almost verbatim. Such deceptive practices indicate a troubling trend where attackers exploit users’ trust in established brands to distribute malicious software.
The attack operated through a sophisticated six-stage chain. Users who inadvertently landed on the compromised repository would be instructed to clone the repo and execute either start.bat for Windows or python loader.py for Linux and macOS systems. This seemingly benign process concealed a more sinister intent. The Python script included a base64-encoded string, which, upon execution, unleashed a malicious executable—specifically a Rust-based infostealer.
This infostealer was equipped with multiple evasion techniques to bypass the victim’s security systems. According to HiddenLayer, the malware hid its use of Windows APIs to circumvent static analysis. It also conducted various checks to detect the presence of debuggers and sandboxes, along with inspection for virtual machines, such as VirtualBox, VMware, QEMU, and Xen. To further safeguard its operation, the malware attempted to disable Windows Antimalware Scan Interface (AMSI) and Event Tracing for Windows (ETW), aiming to evade behavioral detection mechanisms.
Once successfully deployed, the malware set out to harvest sensitive information, including browser passwords, session cookies, Discord tokens, cryptocurrency wallet credentials, Telegram sessions, and other critical data. The expansive range of stolen information poses a significant threat to users and organizations alike.
Recommended Mitigation Strategies
Given the severity of this threat, HiddenLayer urged users who cloned the rogue repository and executed the aforementioned scripts to consider their systems fully compromised. They emphasized that victims should refrain from logging into any accounts from the affected device until it has been thoroughly wiped clean.
In their advisory, HiddenLayer recommended isolating the compromised host immediately and rotating all credentials stored within browsers, password managers, or any credential stores on that device. This includes saved passwords, session cookies, OAuth tokens, SSH keys, FTP credentials (notably those associated with FileZilla), and any tokens linked to cloud services.
Moreover, users are advised to treat all browser sessions as compromised, even if the password was not saved, since captured session cookies can enable threat actors to circumvent Multi-Factor Authentication (MFA). The following actions are crucial:
- Users should transfer any cryptocurrency holdings to a new wallet created on a secure device, assuming that seed phrases, keystores, and wallet extension data might have been compromised.
- Invalidate any existing Discord sessions and reset Discord passwords, as the malware explicitly targets these credentials.
- Implement blockades on the Indicators of Compromise (IOCs) outlined in HiddenLayer’s report and conduct historical connection hunts to identify any additional hosts that may have been affected.
The prevalence of infostealer malware continues to escalate, fueling a burgeoning cybercrime economy. Recent data published by KELA indicated that at least 347 million credentials had been acquired by infostealers that infect approximately 3.9 million machines. Such statistics highlight a pressing need for increased vigilance among users and organizations in securing their digital infrastructure against such formidable threats.
As the landscape of cyber threats evolves, the importance of safeguarding AI supply chains and remaining vigilant against malware attacks becomes ever more critical. Users and developers alike must adhere to best practices in cybersecurity to mitigate risks associated with malicious repositories and software.
In light of these developments, it is evident that robust security measures and user education are paramount in combatting these evolving threats in the AI landscape.
