Dremio Corp., the developer of a data lakehouse platform based on the Apache Iceberg table format, today is throwing its support behind the Polaris data catalog that cloud data warehousing firm Snowflake Inc. released to open source in June.
The move is an important third-party endorsement for Snowflake, battling rival Databricks Inc. to become the platform of choice for artificial intelligence development on top of the open-source Apache Iceberg file system.
Dremio said it would also support Databricks’ rival Unity Catalog managed service, but its endorsement was less full-throated than for Snowflake. Over time, Dremio said it intends to adopt Polaris as its primary catalog and fold in advanced features that won’t be available to Unity Catalog users.
Dremio also said its data catalog for Iceberg is now available for on-premises, cloud and hybrid cloud deployment. Previously, the software had only been provided as a managed service. Dremio claims to be the only lakehouse provider to support all three deployment options. A data lakehouse is a data storage architecture that combines the best features of flexible data lakes and high-performance data warehouses.
Data catalogs, which are organized inventories of data assets within an organization that help users discover, understand and access data efficiently, have been the source of a pitched battle between Snowflake and Databricks. Both claim to be more committed to supporting the popular open-source Iceberg format than the other and both say their catalogs will eventually be freely available as open source.
Databricks partially released its Unity Catalog to open-source in June and says it intends to fully donate the platform to the community over time. Polaris is fully open-sourced under an Apache 2.0 license. Both companies offer managed versions of the open projects.
A nonissue
Dremio says it aims to make the choice of a catalog a nonissue by allowing customers to choose whichever one they prefer. However, there are some tradeoffs, at least in the short term. Dremio customers can read from but not write to Iceberg tables from Polaris or Unity Catalog. Write capabilities, column- and row-level access control and automated table management will require Dremio’s commercial catalog offering, which is based on based on the open-source Project Nessie.
The Dremio catalog enables centralized data governance with role-based access control and fine-grained access privileges. It automates table optimization tasks such as compaction and garbage collection and supports constructs such as branching, version control, virtual development environments and time travel, or the ability to view and query data as it existed in the past.
Dremio intends to merge its catalog with and add read/write capabilities to Polaris, at which time Project Nessie will be retired. It has made no such commitment to Unity Catalog.
“We will treat Polaris as our catalog and we will merge Nessie into Polaris,” said Read Maloney, Dremio’s chief marketing officer. “It will be one default catalog for the open-source community and we will operate enterprise services around that. Customers should feel no difference.”
Project Nessie manages metadata and provides Git-like version-control capabilities for tracking changes across data assets. That allows users to branch, commit and revert changes to data without disrupting production environments, allowing for easier collaboration, reproducibility and governance. “The value of Nessie is that it enables DataOps practices for continuously integrating and continuously developing data pipeline code,” said Kevin Petrie, vice president of research at research firm BARC GmbH.
Dremio, which is an elite Snowflake partner, is clearly seeking to tip the scales toward Polaris. “The challenge with Unity is that it is largely a vendor-backed project,” Maloney said. “If you want governance across all those tools, that’s done within the vendor ecosystem. Polaris is the same in the open source and managed versions. There are already 20 vendors actively working to contribute to Polaris and that number will go way up. Do you want the open-source catalog backed by one or two vendors or by the community? We believe customers will go with one that’s community-backed.”
The move also plays to Dremio’s strength as a data lakehouse Switzerland, Petrie said. “Dremio differentiates itself by supporting hybrid environments,” he said. “Its catalog can inventory datasets on many platforms: Databricks or Snowflake in the cloud, data warehouses on-premises and so on.”
Photo: Pixabay
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Source link
lol