Hackers are increasingly setting their sights on machine learning (ML) models, aiming to exploit vulnerabilities in these systems for their own gain. By infiltrating ML models, hackers can not only steal sensitive data and disrupt services, but also manipulate outcomes to their advantage. The potential consequences of such attacks are vast, ranging from system performance degradation and financial losses to the erosion of trust and reliability in AI-driven applications.
Recently, cybersecurity experts at Trail of Bits have identified a new threat known as Sleepy Pickle, which enables threat actors to exploit ML models and target end-users. This attack technique involves exploiting the insecure Pickle format used for distributing machine learning models. Unlike previous methods of compromising systems that deploy models, Sleepy Pickle operates stealthily by injecting malicious code into the model during deserialization.
The malicious code injected through Sleepy Pickle allows threat actors to manipulate model parameters, insert backdoors, and control outputs. By hooking into model methods, hackers can tamper with processed data, posing significant risks to end-user security, safety, and privacy. The attack involves crafting a malicious pickle file containing the model and payload, which upon deserialization, modifies the in-memory model before being returned to the victim.
Sleepy Pickle provides hackers with a powerful means of gaining control over ML systems by discretely injecting payloads that can dynamically alter models during deserialization. This technique surpasses the limitations of conventional supply chain attacks, as it leaves no traces on disk, customizes payload triggers, and expands the attack surface to any pickle file within the target’s supply chain. Unlike uploading malicious models outright, Sleepy Pickle conceals its malice until runtime.
The dynamic and stealthy nature of Sleepy Pickle poses a serious threat to ML systems, enabling attackers to modify model parameters, insert backdoors, and control inputs and outputs. This opens the door to new types of threats, such as generative AI assistants providing harmful advice after being tainted with misinformation. Moreover, the technique’s ability to evade static defenses underscores the challenges in safeguarding ML models from such attacks.
In a concerning turn of events, researchers have demonstrated the ability to compromise ML models to steal private user data by injecting code that records sensitive information triggered by specific keywords. These attacks occur within the model itself, bypassing traditional security measures and highlighting the potential for abuse beyond conventional attack surfaces.
Furthermore, the implications of compromising models used in summarizer applications, such as browser apps, are equally troubling. If attackers were to manipulate the model behind these applications to generate harmful summaries, users could unknowingly be exposed to malicious content. This could lead to unsuspecting users falling victim to phishing scams or malware by clicking on altered summaries containing harmful links.
To mitigate the risks associated with these types of attacks, it is crucial to use models from reputable sources and opt for secure file formats. By remaining vigilant and adopting best practices in ML model security, organizations can better protect themselves and their users from the growing threats posed by hackers targeting ML systems.
