At its Data + AI Summit today, Databricks announced that it’s open sourcing Unity Catalog, the metadata catalog that governs how users and compute engines can access data. Coming off of last week’s news around Apache Iceberg, the move marks an important shift for Databricks as it seeks to maintain momentum as customers increasingly demand open lakehouse platforms.
Databricks unveiled Unity Catalog back in 2021 as a way to govern and secure access to data stored in Delta, the table format that Databricks created in 2017 as the linchpin of its lakehouse strategy. It has remained a proprietary product at Databricks since.
But in recent years, a competing table format, Apache Iceberg, has gained momentum in the big data ecosystem. Databricks addressed Iceberg’s rise last week with the planned acquisition of Tabular, the lakehouse company founded by Iceberg’s creator. Databricks’ strategy is to gradually move the Iceberg and Delta specifications closer together over time, thereby eliminating the differences between them.
That left the humble metadata catalog as the last piece standing between customers and their dream of a truly open data lakehouse. Databricks’ rival, Snowflake, addressed the potential lock-in of the metadata catalog last week with the launch of Polaris, which is based on Iceberg’s REST-based API. The company tells Datanami that it plans to donate the Polaris project to open source, likely the Apache Software Foundation, within 90 days.
That left the still-proprietary Unity Catalog as the odd-man out at the metadata catalog layer, just as a new era of open lakehouses suddenly arrives. To address that strategic shift in the market, Databricks decided to open source Unity Catalog.
The move creates the “USB” for data access, Databricks CEO Ali Ghodsi said during his keynote address at Databricks’ Data + AI Summit in San Francisco.
“All the silos that you had before, they can just access one copy of the data that’s in a standardized USB format under your ownership,” Ghodsi said. “It goes through one governance layer that’s just standardized–that’s Unity Catalog–for all of your data.”
Unity Catalog previously supported Delta and Iceberg, in addition to Apache Hudi, another open table format, via Databricks’ Delta Lake UniForm format. In fact, Unity Catalog also supports Iceberg’s REST-based API, Ghodsi pointed out.
“We basically standardized the data layer and the security layer so that you own your data and everything goes through these open interfaces,” he said. “And I think that’s going to be awesome for the community, for everybody in here. Because we just have way more use cases. We’re going to be able to do much more innovation, and we’ll just expand this market for everybody involved.”
Databricks customers applauded the move, including AT&T and Nasdaq.
“With the announcement of Unity Catalog’s open sourcing, we are encouraged by Databricks’ step to make lakehouse governance and metadata management possible through open standards,” said Matt Dugan, AT&T’s vice president for data platforms. “The flexibility to utilize interoperable tools with our data and AI assets, with consistent governance, is core to the AT&T data platform strategy.”
“Databricks’ decision to open source Unity Catalog provides a solution that helps eliminate data silos and we look forward to further scaling our platform, enhancing our governance, and modernizing our data applications as we continue to deliver for our clients,” said Lenny Rosenfeld, Nasdaq’s vice president of capital access platforms.
It’s not clear what open source foundation Databricks will choose for Unity Catalog OSS, nor what the timeline will be. Previously, Databricks has chosen The Linux Foundation to open source various internally developed products, including Delta and MLFlow.
Unity Catalog will be posted to Github on Thursday during Databricks’ CTO Matei Zaharai keynote at Data + AI Summit, the company said.
Related Items:
All Eyes on Databricks as Data + AI Summit Kicks Off
Databricks Nabs Iceberg-Maker Tabular to Spawn Table Uniformity
Snowflake Embraces Open Data with Polaris Catalog
Source link
lol