Before branching out into data governance, Alation made its mark as a data catalog company, a category to which it is credited with creating. And with today’s release of AI Governance, the Silicon Valley firm is expanding once again, this time by delivering tools to govern the flow of data in AI environments.
As the explosion of generative AI continues, organizations are finding that the technology brings its share of risks as well as rewards. For instance, users may upload sensitive or personally identifiable data to a large language models (LLMs) running in the cloud. Sensitive or copyrighted data may find its way into answers generated by LLMs. And LLMs have tendency to fabricate responses hallucinate and generate biased or toxic content.
These risks (among others) are driving the creation of regulations, such as the EU’s AI Act, to set the guardrails on what is and what is not permitted. Organizations are scrambling to figure out how to get a handle on their AI activities across a number of different areas, including their GenAI-related data flows.
With its new AI Governance solution, Alation is targeting a few aspects of AI governance, but by no means all of them. According to Satyen Sangani, co-founder and CEO of the company, AI Governance is primarily geared toward identifying the data involved in AI training and AI inference, where that data is coming from, and what people and uses are involved in those AI data flows.
“I think everybody needs to understand the provenance of these models, what models that they have, how they’re being leveraged, and what regulations would apply to them,” Sangani tells BigDATAwire in an interview. “[AI Governance] gives you the ability to track all of what you need to in order to be able to make sure that you are running a low-risk and compliant AI operation.”
AI Governance builds on top of Alation’s existing metadata-based cataloging solution and leverages its tag-based tracking system to enable customers to track the lineage of data that makes it into LLMs, as well as what data is being used for fine-tuning and retrieval augmented generation (RAG). If a customer doesn’t already have Alation’s data catalog, one is implemented as part of AI Governance, Sangani says.
Customers should not look to AI Governance for tracking how AI models themselves change over time. “We are not tracking any given version of an LLM and trying to talk about the diff,” Sangani says. “What we’re finding is that customers are deploying GPT 3.5, 4, Strawberry, and they’re now trying to say, okay, here’s the data that I’m feeding it, the products that I’m feeding this information, and here are the people that are doing it.”
The GenAI revolution hit so fast that even this basic information is not tracked anywhere, which is why Alation is building it. Its approach is to build a conceptual model of an AI model that can be quickly referenced to get an idea of how the an AI model interacts with an organization’s data, Sangani says.
Alation leverages the metadata tracking capability to trace the data flows in GenAI applications. Alation figures out what file system an organization is using to store the unstructured call logs that are used to train a customer service chat bot, for example. It also tracks which LLM created embeddings from that data, and what vector database is used to store those embeddings. The software then tracks how all this changes over time.
Alation Governance is helping to address different versions of fear, Sangani says.
“So one version of fear is all these regulations are coming. I don’t really know what to do about it, and I don’t really know what I need to prove to you” that the GenAI application is kosher. “There’s another version of that that’s like, I don’t even know what I have, in order to be able to know what I need to comply.”
Even if you have the inventory of use cases and projects, the next question is, at what stage of development or deployment is the GenAI application? Customers may have a good idea of what’s in pilot versus what’s been rolled into production, but tracking the use of data along that DevOps journey is another challenge altogether, Sangani says.
“How do I actually reproduce that information?” he says. “I think it’s not necessarily that people are badly intended, but these data landscapes are really complicated, and it’s not necessarily clear how this stuff was produced.”
Tracking data pipelines was hard enough when data was mostly created and consumed in deterministic applications, Sangani says. When you add the probabilistic nature of GenAI applications, the fuzziness factor starts to become a real challenge. While the space is maturing rapidly, there are real frustrations he says. “So it’s a fun game.”
Related Items:
Alation Turns to GenAI to Automate Data Governance Tasks
Data Culture Report: More Investment Needed, Alation Says
Alation Springs Into Data Governance
Source link
lol