Announcing Public Preview of Hive Metastore and AWS Glue Federation in Unity Catalog

Announcing Public Preview of Hive Metastore and AWS Glue Federation in Unity Catalog


We’re excited to announce the Public Preview of Hive Metastore (HMS) and AWS Glue Federation in Unity Catalog! This new capability enables Unity Catalog to seamlessly access and govern tables stored in Hive Metastores—whether internal to Databricks or external—as well as AWS Glue. It represents a key milestone in our Lakehouse Federation vision, which brings external data sources including databases, data warehouses and catalogs, together under a unified governance framework with Unity Catalog. You can effortlessly discover, govern and query all your data from a single, centralized platform, regardless of the format and location. This not only fosters open access and collaboration across your organization but also extends data intelligence into every data source.

In this blog, we’ll explore the benefits of HMS and AWS Glue Federation, explain how it works, and provide guidance on getting started.

Why Hive Metastore and AWS Glue Federation? 

HMS has been an early standard for cataloging data for use in big data systems, and while it provides foundational functionalities, they are not ideally suited for modern data and AI workloads that demand comprehensive governance including fine-grained access controls on rows and columns, lineage, monitoring and auditing across all data and AI assets in one place. 

Unity Catalog addresses these shortcomings by providing the industry’s only unified, open governance solution for managing all data and AI assets. It enables organizations to create an enterprise catalog that curates files, tables, ML models, AI tools, notebooks, and metrics, all governed with fine-grained access controls, lineage, monitoring, auditing and cross-platform sharing in one solution. Over 10,000+ enterprises are now leveraging Unity Catalog to govern their data estate.

HMS and AWS Glue Federation provide significant benefits for organizations with HMS deeply embedded in their data architecture. For those with long-standing HMS or AWS Glue deployments, this capability offers a seamless path to leverage Unity Catalog’s advanced features over data stored in the HMS or Glue metastore. It ensures operational continuity by enabling organizations to sustain legacy workflows while gradually upgrading existing data and workspaces to Unity Catalog.

Key benefits include:

  • Seamless integration: Connect your existing HMS and AWS Glue catalogs directly to Unity Catalog without requiring manual metadata migration.
  • Simplified data discovery: Access and explore metadata from HMS and AWS Glue through a unified interface, alongside other data and AI assets in Unity Catalog.
  • Comprehensive governance: Leverage Unity Catalog’s fine-grained access controls, tagging, classification, lineage, and audit capabilities on top of the data stored in HMS and AWS Glue.

“We have years’ worth of datasets that are cataloged in an external Hive Metastore. HMS Federation allows us to immediately benefit from Unity Catalog only features like robust access control and self-serve AI tooling through Genie Spaces, without the overhead of migrating all of these tables into Unity Catalog”

— James Davidheiser, Technical Lead, Data Infrastructure, Asana

How it works

Unity Catalog now includes federation connectors for Hive Metastore (HMS) and AWS Glue, serving as a translation layer between Unity Catalog and your external metastores. These connectors let you mount entire HMS catalogs (both internal and external) or AWS Glue as foreign catalogs within Unity Catalog, making them appear as native objects. You can define fine-grained access controls, view lineage, perform audits, and query HMS or AWS Glue managed tables using the Databricks engine. The federation supports both reading and writing to tables in internal HMS within Databricks workspaces while offering read-only access for tables in external HMS and AWS Glue.

With this capability, you can read all tables in HMS and AWS Glue—Parquet, Delta, and Iceberg (coming soon in Public Preview)—enabling you to access and govern all your tables seamlessly.

HMS and AWS Glue Federation in Unity Catalog

Check out the video tutorial below to explore AWS Glue and HMS Federation in action.

Get started

By embracing Unity Catalog as the cornerstone of your Lakehouse architecture, you can unlock the power of a unified and open governance implementation that spans your entire data and AI estate.

  • Follow the HMS Federation guides ( AWS, Azure and GCP) to get started.
  • To get started with Unity Catalog, follow the Unity Catalog guides available for AWS, Azure, and GCP



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.