Databricks + Tabular

We are excited to announce that we have agreed to acquire Tabular, Inc, a data management company founded by Ryan Blue, Daniel Weeks, and Jason Reid. This acquisition brings the original creators of Apache Iceberg™ and those of Linux Foundation Delta Lake, the two leading open source lakehouse formats, together. As one, we are going to lead the way with data compatibility so that you are no longer limited by which lakehouse format your data is in. This blog will go through how we intend to work closely with the Iceberg and Delta Lake communities to bring format compatibility to the lakehouse; in the short term inside Delta Lake UniForm and in the long term by evolving toward a single, open, and common standard of interoperability. We look forward to welcoming the team once the transaction closes and we are excited to work with them towards our joint vision of the open lakehouse.

The rise of lakehouse architecture and the format incompatibility

The lakehouse architecture was pioneered by Databricks in 2020 to enable the integration of traditional data warehousing workloads with AI workloads on a single, governed copy of data. For this to work, ALL the data had to be in an open format – that way different workloads, applications and engines could access the same data. Lakehouse architecture maximizes enterprise productivity by democratizing access to data. This is in contrast to proprietary data warehouses where only a proprietary SQL engine can read, write or share the data, and data often has to be copied and exported to be used by other applications, creating a high degree of vendor lock-in. Four years later, the lakehouse architecture has taken the market by storm – 74% of enterprises have deployed a lakehouse according to a survey conducted by the MIT Technology Review.

The foundation of the lakehouse is open source data formats that enable ACID transactions on data stored in object storage. These formats dramatically improve the reliability and performance of data operations on the data lake and were specifically designed for open source engines such as Apache Spark™, Trino and Presto. To address these challenges, we worked with the Linux Foundation to create the Delta Lake project. We have been humbled by Delta Lake’s adoption since its inception: the open source project has over 500 code contributors from a diverse set of organizations, and over 10,000 companies globally use Delta Lake to process 4+ exabytes of data on average each day.

Around the same time Delta Lake was created, Ryan and Daniel developed the Iceberg project at Netflix and donated it to the Apache Software Foundation. These two projects have emerged as the two leading open source standards for Lakehouse formats. Unfortunately, even though both of these formats are based on Apache Parquet and share similar goals and designs, they became incompatible due to their independent development.

Over time, a number of other open source and proprietary engines adopted these formats. However, they usually adopted only one of the standards, and more often than not, only part of that standard. This has effectively fragmented and siloed enterprise data, undermining the value of the lakehouse architecture.

The Road to Interoperability

Fundamentally, companies need to be able to have data interoperability to realize the benefits of the lakehouse. We intend to work closely with the Iceberg and Delta Lake communities to bring interoperability to the formats themselves. This is a long journey, one that will likely take several years to achieve in those communities. That’s why we introduced Delta Lake UniForm to the world last year. UniForm tables provide interoperability across Delta Lake, Iceberg, and Hudi, and support the Iceberg restful catalog interface so that companies can use the analytics engines and tools they’re already familiar with, across all their data. With UniForm you can get compatibility today, and with the addition of the original Iceberg team, we are going to invest heavily to greatly broaden the ambitions of Delta Lake UniForm. Generally available today, UniForm allows companies to achieve compatibility. With the addition of the original Iceberg team, Databricks will greatly broaden the ambitions of Delta Lake UniForm.

A Shared Commitment to Openness

Finally, Databricks and Tabular share a history of championing open source formats. Both companies were founded to commercialize open source technologies created by the founders and today, Databricks is the largest and most successful independent open source company by revenue and has donated 12 million lines of code to open source projects. This acquisition highlights our commitment to open formats and open source data in the cloud, helping ensure that companies are in control of their data and free from the lock-in created by proprietary vendor-owned formats.

To learn more about Databricks and Tabular joining forces, register to attend the Data + AI Summit, June 10-13: databricks.com/dataaisummit

Source link
lol

Databricks + Tabular

The rise of lakehouse architecture and the format incompatibility

The Road to Interoperability

A Shared Commitment to Openness

By stp2y

Leave a Reply Cancel reply