Data storage is all the rage again, with global data predicted to reach 200 zettabytes by 2025, half of which will be in the cloud. Only five years ago, that number was 41 zettabytes.
One of the primary reasons for the rapid increase in data volume is the shift from text-based to multimodal models and the development of super intelligent and artificial intelligence systems. Many companies are battling for the AI crown, but Vast Data Inc., the fast-growing data computing platform, has positioned itself as the backbone for a data space shaped by machine learning.
The company’s prospects look bright — the market for next-generation storage is anticipated to grow to $150 billion in 2032, at a compound annual growth rate of 10%, and Vast has lucrative partnerships with Nvidia Corp. and cloud service providers. One would do well to ask what Vast is bringing to data storage and how it fits into the AI ecosystem.
“There’s a huge data problem,” said Jeff Denworth, co-founder of Vast, in an exclusive interview with theCUBE. “We sit in the center of that whole pipeline. Our system is both a distributed enterprise data warehouse and a distributed unstructured data store, so every single step of the pipeline can be solved with a unified high-performance and extremely affordable enterprise platform. It’s way more than just training. There’s a whole lot more that goes into it.”
Last year, Vast introduced its Data Platform on theCUBE, SiliconANGLE Media’s livestreaming studio, selling it as a way for developers to interact with data regardless of file type or location. Now, the company is preparing to introduce its Data Engine, another layer of software that orchestrates events in the AI pipeline at its Cosmos event in October. TheCUBE will be providing exclusive coverage of the event. (* Disclosure below.)
How the Vast Data Platform manages unstructured data
Vast has taken an unconventional approach to data, fusing the concepts of data storage and database in order to manage both unstructured and structured data for the era of AI. Users can interface with all of their data, on-premises and in the cloud, on Vast’s distributed platform.
“In my career, I’ve seen a lot of data companies start expanding beyond their core capabilities into the stack,” said Sanjeev Mohan, industry analyst and principal at SanjMo, during Vast’s Build Beyond event. “This is the first time I’ve seen a storage company come into the picture and start offering all of these new capabilities.”
The Vast Data Platform has four components: the DataStore, DataSpace, DataBase and DataEngine. The DataStore provides universal storage based on the clusters of their Disaggregated Shared Everything Architecture. DASE builds on Google’s concept of shared-nothing systems by separating compute logic from the system state. This essentially allows clusters to scale up the platform’s data capacity independent of the CPUs.
Eliminating east-west traffic is one of the biggest features of the Vast DataBase, a transactional data warehouse that merges the capabilities of traditional databases, data warehouses and data lakes. It offers an alternative to Snowflake Inc. and Databricks, Inc.’s data lake or data lakehouse architecture, focusing on the unstructured data being processed by AI.
“We started from the storage layer,” Denworth said. “The reason that we wanted to do this is that we saw deep learning and the power of these really big AI supercomputers that Nvidia was going to be building, requiring a more scalable, more high-performance, real-time architecture. We think about [Vast DataBase] as an exabyte scale, transactional data warehouse, designed intentionally for the power of flash and when you put these two things together, now you have this unlimited scale information architecture that turns unstructured data into structured data, and those aren’t the types of things we think Snowflake and Databricks are really working on.”
By streaming data into the platform at any level of scale, Vast claims to have closed the observability gap present in data lake and data lakehouse architecture. On top of that, you have the Vast DataSpace, which unifies multiple clouds and allows users to flow all of their tabular files and records into one pipeline.
“What comes out of it is a product that is more efficient and more scalable than the products that [Snowflake and Databricks] built for their purposes but is so much more broad in its capability because it also supports these different data types that don’t naturally fit in a data warehouse,” Denworth said.
Transforming the AI ecosystem
Vast is all in on flash, with an architecture that makes flash more affordable and solves for the input/output (IO) bottlenecks that have previously plagued the storage market. The result is anywhere between a two and 20 times improvement in pipeline performance, according to Denworth.
The final layer of simplification appears to be Vast’s DataEngine, which builds on the company’s collaboration with Nvidia and provides a server-level engine to orchestrate AI pipelines. It is based on event triggers and functions — any I/O event in the DataStore can trigger a function for the DataEngine to execute without explicit instruction from the AI developer.
“We [saw] the application stack, broadly speaking, on top of all those data problems … and realized that that is equally complicated,” Denworth said. “Considering that these are all data-driven challenges that customers are facing, what we wanted to do was pull the application stack into our systems to make it easier for organizations to go and implement enterprise AI across organizations and unlock the secrets that’s in their data using AI tools.”
In March, Vast announced that it would be using Nvidia’s BlueField-3 data processing unit technology to perform its data storage services at a massive scale, as well as a partnership with Super Micro Computer Inc., a global supplier of AI hardware solutions. Denworth hints at a flurry of announcements with key partners, including major leaders in the IT industry, at the Cosmos event in October.
“The objective is to basically start to enable real-time, data-driven AI across every enterprise,” he said.
Building on partnerships in the cloud space
Another set of Vast collaborators are cloud GPU providers, which are growing in popularity, in part because they allow companies to rent expensive computer resources instead of buying them.
“The thing that CSPs have more than anything else is power and the ability to build scalable and reliable infrastructure really quickly,” said Denworth, in an earlier interview. “They become the destination for next-generation enterprise workloads that simply aren’t able to be deployed in a customer’s four walls.”
Vast has already announced partnerships with AI cloud providers CoreWeave and Lambda Labs, so we can look forward to seeing how its various collaborations have borne fruit in October, as well as details on the Vast DataEngine.
“You’re going to see us hang out the promises we made last year in terms of furthering a next-generation, game-changing systems architecture that solves a ton of customer problems,” Denworth said. “Vast is in a leadership position as organizations start to reshape their infrastructure agenda in the world of AI. There is a movement happening around and within Vast Data that is just too big to be ignored anymore.”
(* Disclosure: TheCUBE is a paid media partner for the Vast Data “Cosmos” event. Neither Vast Data Inc., the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Image: SiliconANGLE/Canva
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Source link
lol