Attackers have been exploiting vulnerabilities in repositories for open-source artificial intelligence models, such as Hugging Face, to inject malicious code undetected. This growing issue highlights the importance for companies engaging in internal AI projects to implement robust security measures to identify security flaws and malicious code within their supply chains.
A recent analysis by ReversingLabs revealed that Hugging Face’s automated checks failed to detect malicious code in two AI models hosted on the platform. The threat actor utilized a familiar vector – data files in the Pickle format – with a new method called “NullifAI” to evade detection. Despite these attacks being proofs-of-concept, the fact that they were able to bypass security checks and be hosted with a “No issue” tag emphasizes the need for companies to not solely rely on repositories’ safety mechanisms for security.
Tomislav Pericin, chief software architect at ReversingLabs, emphasized the risk associated with public repositories where various developers and machine learning experts can host their content, enabling malicious actors to misuse the platform. The vulnerability lies in the potential for someone to host a tainted version of a file, hoping organizations will inadvertently install it.
The majority of businesses are adopting AI and leveraging open-source models from repositories like Hugging Face, TensorFlow Hub, and PyTorch Hub for internal AI projects. A Morning Consult survey sponsored by IBM reported that 61% of companies are utilizing models from the open-source ecosystem to develop their AI tools.
One significant issue contributing to the security vulnerabilities is the prevalent use of Pickle files, a commonly used data format that is known for its insecurity in executing arbitrary code. Despite warnings from security researchers, many data scientists continue to use the Pickle format, exposing organizations to risks such as code execution, backdoors, prompt injections, and alignment issues.
Tom Bonner, vice president of research at HiddenLayer, highlighted the persistence of the Pickle format despite clear security risks. Several organizations have fallen victim to compromises through machine learning models, underscoring the need for a transition to safer data formats like Safetensors, a secure alternative endorsed by Hugging Face, EleutherAI, and Stability AI.
Further complicating the security landscape is the licensing aspect of open-source AI models, which often lack the necessary information to reproduce the model entirely. Andrew Stiefel, a senior product manager at Endor Labs, warned of the complexities in the licenses surrounding AI models and the implications for commercial product development.
Additionally, the alignment of AI models with developers’ and users’ values poses a challenge, as evidenced by instances where models like DeepSeek have been used to create malware. The continuous discovery of vulnerabilities in AI models highlights the need for rigorous testing and evaluation to ensure the integrity and safety of these systems.
Companies are advised to carefully manage AI models akin to open-source dependencies by assessing factors like the source of the model, development activity, popularity, and associated security risks. Employing a holistic approach to evaluate and mitigate risks associated with AI models is crucial in safeguarding against potential security threats and vulnerabilities.

