Companies Investigate Strategies to Protect Data in the Era of LLMs

Large language models (LLMs) like ChatGPT have disrupted the data security market as companies seek solutions to prevent employees from leaking sensitive and proprietary information to external systems. To address this concern, companies are taking significant measures such as banning employees from using these systems, implementing basic controls provided by generative AI providers, and utilizing various data security services including content scanning and LLM firewalls. Recent research highlights the possibility of leaks, with notable incidents at Samsung and studies revealing that up to 4% of employees input sensitive data.

The situation is expected to worsen in the short term, as LLMs have the capability to efficiently extract valuable data from training data when prompted appropriately. Ron Reiter, co-founder and CTO at data life cycle security firm Sentra, emphasizes the importance of technical solutions in combating the growing data security problem. He explains that “data loss prevention became much more of an issue because there’s suddenly … these large language models with the capability to index data in a very, very efficient manner.” This means that data sent around by individuals has a higher chance of landing in an LLM, making it easier to find sensitive data.

Companies have faced challenges in finding effective ways to address the risk of data leaks through LLMs. For example, Samsung banned the use of ChatGPT after engineers shared sensitive data, while Apple restricted its employees from using the system to prevent disclosure of proprietary information. Financial firms like JPMorgan have also imposed limits on employee usage of the service due to regulatory concerns. The risks associated with generative AI are heightened by the complex and unstructured nature of the data typically used in LLMs, which often defies conventional data security solutions that focus on specific types of sensitive data.

AI system providers have implemented some solutions, but concerns among organizations persist. OpenAI, for instance, has disclosed certain data controls in ChatGPT that allow organizations to disable chat history and block access by ChatGPT to train their own models. However, many organizations do not feel comfortable having their employees send sensitive data to ChatGPT. Meanwhile, LLM providers are searching for ways to address these concerns and offer data leak prevention options, such as private instances that keep data internal to a company. However, even with this option, the risk of sensitive data leakage remains, as not all employees should have the same access to corporate data and LLMs make it easy to find the most sensitive information.

Managing an internal LLM requires significant effort, including in-house machine learning (ML) expertise to implement and maintain these massive AI models. Organizations can train their own domain-specific LLMs using proprietary data, which offers maximum control over sensitive data protection. However, this option is only viable for organizations with the necessary ML and deep learning skills, computational resources, and budget.

Data security technologies can adapt to counter various potential data leakage scenarios associated with LLMs. For example, Sentra utilizes LLMs to identify complex documents that may constitute a leak if submitted to AI services. Threat detection firm Trellix monitors clipboard snippets and web traffic for potential sensitive data and blocks access to specific sites. A new approach known as LLM firewalls can prevent LLMs from ingesting risky data and returning improper responses. Additionally, legal and compliance teams can provide education and warnings to users and limit access to sensitive information. Granular rules for specific sensitive data types can also be created to define data loss prevention policies.

Companies can also leverage comprehensive security measures, such as adopting zero trust network access (ZTNA) and cloud security controls, along with firewall-as-a-service. This combination, referred to as the security services edge (SSE), enables organizations to treat generative AI as a distinct web category and block sensitive data uploads. Gartner’s Ravisha Chugh suggests that organizations should use the block option to prevent sensitive data from entering ChatGPT through web or API interfaces.

In conclusion, the emergence of LLMs has presented challenges to data security, requiring organizations to implement measures to prevent data leaks. Although risks remain, there are various technical solutions and data security methods available to mitigate these concerns. As the data security landscape evolves, organizations will need to stay vigilant and adapt to the changing threat landscape to safeguard their sensitive and proprietary information.

Source link

Select a plan

Monthly plan

Yearly plan

All plans include

Search for an article

Companies Investigate Strategies to Protect Data in the Era of LLMs

Latest articles

Cyber Briefing – April 20, 2026 – CyberMaterial

A Token Flaw Converted Azure’s AI Agent Into a Spy

Ghost RAT and CloverPlus Target Victims in Combined Malware Campaign

ZionSiphon Malware Attacks Water Infrastructure Systems

More like this

Cyber Briefing – April 20, 2026 – CyberMaterial

A Token Flaw Converted Azure’s AI Agent Into a Spy

Ghost RAT and CloverPlus Target Victims in Combined Malware Campaign