Weighing Your Data Security Options for GenAI

(Image courtesy Fortanix)

No computer can be made completely secure unless it’s buried under six feet of concrete. However, with enough forethought into developing a layered security architecture, data can be secured enough for Fortune 500 enterprises to feel comfortable using it for generative AI, says Anand Kashyap, the CEO and co-founder of the security firm Fortanix.

When it comes to GenAI, there’s a host of things that keep Chief Information Security Officers (CISOs) and their colleagues in the C-Suite up at night. For starters, there is the prospect of employees submitting sensitive data to a public large language model (LLM), such as Gemini or GPT-4. There’s the potential for that data to make into the LLM to spill out of it.

Retrieval augmented generation (RAG) may lessen these risks somewhat, but embeddings stored in vector databases must still be protected from prying eyes. Then there are hallucination and toxicity issues to deal with. And access control is a perennial challenge that can trip up even the most carefully architected security plan.

Navigating these security issues as it relates to GenAI is a big priority for enterprises at the moment, Kashyap says in a recent interview with BigDATAwire.

“Large enterprises understand the risks. They’re very hesitant to roll out GenAI for everything they would like to use it for, but at the same time, they don’t want to miss out,” he says. “There’s a huge fear of missing out.”

LLM’s pose unique data security challenges (a-image/Shutterstock)

Fortanix develops tools that help some of the biggest organizations in the world secure their data, including Goldman Sachs, VMware, NEC, GE Healthcare, and the Department of Justice. At the core of the company’s offering is a confidential computing platform, which uses encryption and tokenization technologies to enable customers to process sensitive data in an enviroment secured under a hardware security module (HSM).

According to Kashyap, Fortune 500 companies can securely partake of GenAI by using a combination of the Fortanix’s confidential computing platform in addition to other tools, such as role-based access control (RBAC) and a firewall with real-time monitoring capabilities.

“I think a combination of proper RBAC and using confidential computing to secure multiple parts of this AI pipeline, including the LLM, including the vector database, and proper policies and configurations which are monitored in real time–I think that can make sure that the data can stay protected in a much better way than anything else out there,” he says.

A data cataloging and discovery tool that can identify the sensitive data in the first place, as well as the addition of new sensitive data as time goes on, is another addition that companies should add to their GenAI security stack, the security executive says.

“I think a combination of all of these, and making sure that the entire stack is protected using confidential computing, that will give confidence to any Fortune 500, Fortune 100, government entities to be able to deploy GenAI with confidence,” Kashyap says.

Anand Kashyap is the CEO and co-founder of Fortanix

Still, there are caveats (there always are in security). As previously mentioned, Fortune 500 companies are a bit gun-shy around GenAI at the moment, thanks to several high-profile incidents where sensitive data has found its way into public models and leaked out in unexpected ways. That’s leading these firms to err on the side of caution with GenAI, and only greenlight the most basic chatbot and co-pilot use cases. As GenAI gets better, these enterprises will come under increasing pressure to expand their usage.

The most sensitive enterprise are entirely avoiding the use of public LLMs due to the data exfiltration risk, Kashyap says. They might use a RAG technique because it allows them to keep their sensitive data close to them and only send out prompts. However, some institutions are hesitant to even use RAG techniques because of the need to properly secure the vector database, Kashyap says. These organizations instead are building and training their own LLMs, often use open source models such as Facebook’s Llama-3 or Mistral’s models.

“If you are still worried about data exfiltration, you should probably run your own LLM,” he says. “My recommendation would be for companies or enterprises who are worried about sensitive data not use an externally hosted LLM at all, but to use something that they can run, they can own, they can manage, they can look at it.”

Fortanix is currently developing another layer in the GenAI security stack: an AI firewall. According to Kashyap, this solution (which he says currently has no timeline for delivery) will appeal to organizations that want to use a publicly available LLM and want to maximize their security protection around it.

“What you need to do for an AI firewall, you need to have a discovery engine which can look for sensitive information, and then you need a protection engine, which can either redact it or maybe tokenize it or have some kind of a reversible encryption,” Kashyap says. “And then, if you know how to deploy it in the network, you’re done.”

However, the AI firewall won’t be a perfect solution, he says, and use cases involving the most sensitive data will probably require the organization to adopt their own LLM and run it in-house, he says. “The problem with firewalls is there’s false positives and false negatives? You can’t stop everything, and then you stop too much,” he says. “It will not solve all use cases.”

GenAI is changing the data security landscape in big ways and forcing enterprises to reconsider their approaches. The emergence of new techniques, such as confidential computing, provides additional security layers that will give enterprises the confidence to move forward with GenAI tech. However, even the most advanced security technology won’t do an enterprise any good if they’re not taking basic steps to secure their data.

“The fact of the matter is, people are not even doing basic encryption of data in databases,” Kashyap says. “Lots of data gets stolen because that was not even encrypted. So there’s some enterprises which are further along. A lot of them are much behind and they’re not even doing basic data protection, data security, basic encryption. And that could be a start. From there, you keep improving your security status and posture.”

New Cisco Study Highlights the Impact of Data Security and Privacy Concerns on GenAI Adoption

ChatGPT Growth Spurs GenAI-Data Lockdowns

Source link
lol