How Delta Sharing Enables Secure End-to-End Collaboration | Databricks Blog

How Delta Sharing Enables Secure End-to-End Collaboration | Databricks Blog


In today’s digital landscape, secure data sharing is critical to operational efficiency and innovation. Databricks and the Linux Foundation developed Delta Sharing as the first open source approach to data sharing across data, analytics and AI.  Databricks provides secure data exchange, facilitating seamless sharing across platforms, clouds and regions. Enterprises of all sizes trust Delta Sharing, which supports a broad spectrum of applications and diverse data formats. This flexibility makes it a reliable tool for organizations seeking to harness the full potential of their data assets.

In this blog, we will review Delta Sharing’s security architecture through three different sharing scenarios— Databricks customer to Databricks customer (D2D), Databricks customer to Open sharing (D2O), and cross-cloud data sharing. We will summarize the benefits of implementing Delta Sharing as part of a modern data collaboration strategy, such as enhanced operational efficiency through streamlined, secure data exchanges across various platforms and clouds, and reducing complexity and risk. This secure framework accelerates time to insight, enabling quicker decision-making while maintaining robust privacy protections that foster trust among stakeholders. Additionally, Delta Sharing’s flexibility supports a diverse range of data formats and applications, making it adaptable to evolving business needs in a secure manner. Each scenario includes a customer testimonial that highlights first-hand knowledge of the solution’s game-changing impact. We will focus this blog on Databricks Delta Sharing, where the data provider is using the managed version of the Databricks platform.

Databricks to Databricks Data Sharing (D2D)

The D2D scenario exemplifies secure, streamlined data exchange between two Databricks customers within the Databricks ecosystem. It features Databricks-managed connections and a no-token exchange system, ensuring both simplicity and security.

Using D2D sharing, customers benefit from Delta Sharing’s native integration with Unity Catalog (UC) which provides unified governance and security for sharing operations. It’s important to note sharing is not just limited to data—Unity Catalog goes beyond datasets to include volumes, notebooks, and AI models, showcasing an impressive range of functions. Delta Sharing for intra-account sharing is also turned on by default, while external sharing is available when activated with the required admin-level access. In order to set up Databricks Delta Sharing, you simply need at least one Databricks workspace that is enabled for Unity Catalog and Metastore, along with an admin role or the CREATE SHARE and CREATE RECIPIENT privileges (See documentation for account setup).

Unity Catalog provides a unified governance layer throughout— from the initial steps of creating a recipient and establishing shares to the crucial act of granting access. The Delta Sharing service processes API requests conducts thorough authorization checks, and keeps detailed activity logs. All of these steps ensure operations are as transparent as they are secure, much like a well-oiled machine that you can trust to keep your sharing ecosystem running smoothly.

Data Access: Delving deeper into post-authorization data access, Unity Catalog is again a crucial element. Upon receiving authorization from Unity Catalog, the method of access is determined—either cloud tokens or pre-signed URLs— based on factors such as asset type and sharing arrangement. For cloud tokens, a read-only scoped-down SAS token is minted by the provider’s UC which is then forwarded to the recipient’s compute plane. This provides secure limited-time storage access to the table root directory. Similarly, with pre-signed URLs, a list of relevant URLs are created and sent to the recipient’s compute plane, providing secure, temporary access to the storage files. By strategically using security features when using different cloud services, such as Azure SAS tokens and AWS pre-signed URLs you can ensure that only authorized individuals can access the data in a secure setting across regions and clouds. Moreover, the interactions are confined to the recipient and provider’s control planes, and it is a privileged operation that cannot be triggered by external agents, thus protecting against external breaches. This methodology underscores the system’s adaptability, ensuring that data sharing is both flexible and secure, adeptly accommodating a wide array of business needs.

Coastal Community Bank selected Delta Sharing in order to meet its rigorous and challenging data sharing, compliance and security demands from its network of partners. Coastal chose Cavallo Technologies to help them develop a modern data platform. Rob Cavallo, President at Cavallo Technologies, explains Coastal needed a flexible solution for now and into the future, Read Coastal Community Bank case study.

“In some ways, Coastal [Community Bank] was asking for a paradox: enable easy collaboration yet meet the highest security standards for consumer financial data. It’s critical to ensure the platform is performant and cost-effective for today’s workloads while also adaptable enough to handle future use cases not yet imagined. In the end, the Databricks Data Intelligence Platform was the only platform we found that empowered us to do that.”

— Rob Cavallo, President at Cavallo Technologies

Secure Data Sharing, Beyond Tables

Delta Sharing supports more than just tabular data, embracing a more holistic approach to data collaboration with the inclusion of non-tabular data assets such as volumes, notebooks, and AI models. These asset types are currently only supported in the D2D sharing framework, where they enhance the collaborative ecosystem. AI models are shared in a similar manner to volumes, while notebooks feature a unique sharing mechanism. Notebooks can be previewed by recipients through a pre-signed URL, rendering the content as HTML in a pop-up window for immediate access. For deeper integration, notebooks can also be imported into the recipient’s environment, utilizing base64 encoding and API calls for a seamless transition.

AI model sharing is facilitated by generating a secure, read-only scoped down SAS token that is minted by the provider’s UC, which is then forwarded to the recipient’s Compute plane. This approach ensures secure and efficient access and avoids the need for extraneous copies of the model by allowing a one-time copy to the Model Registry in the recipient’s UC. This copy of the model can then be deployed to multiple regions to optimize the inference process, enhance performance with reduced latency and deliver faster response times by leveraging regional data centers closer to the end users. iscovering, accessing, and utilizing shared volumes and AI models with Delta Sharing demonstrates both similar and tailored approaches that match each data type, promoting a secure and versatile platform for data sharing and collaboration.

Databricks to Open Data Sharing (D2O)

Transitioning to the open sharing scenario, D2O upholds strict security protocols for a Databricks customer sharing data with external third-party users not on Databricks. D2O enables recipients to directly connect to shared data using Delta Sharing connectors that support various systems like pandas, Tableau, Apache Spark, Rust, or others that support the open protocol, without first needing a specific compute platform.

Upon creating an open recipient in Databricks, a secure, one-time activation URL is generated, allowing the recipient to download a credential file that contains a Delta Sharing endpoint address and a token. In case of a security breach, providers have the ability to take immediate action, such as changing a recipient’s credentials or withdrawing their read permissions to prevent any further issues.

Data Access Workflow: When a recipient queries a shared table using one of these mentioned connectors, Delta Sharing verifies the recipient using tokens from the credential file, and provides pre-signed URLs for accessing the data. This approach ensures compatibility with various open source connectors, safeguarding the integrity and security of the shared assets. (See more on sharing and accessing data.)

Cox Automotive Europe (part of Cox Automotive) is the world’s largest automotive service organization using Delta Sharing to centrally manage and audit data shared outside their enterprise data services team, while ensuring robust security and governance. Read Cox Automotive case study.

“Delta Sharing makes it easy to securely share data with business units and subsidiaries without copying or replicating it. It enables us to share data without the recipient having an identity in our workspace.”

— Robert Hamlet, Lead Data Engineer at Cox Automotive

Cross-Cloud Data Sharing

Enterprises are increasingly adopting cross-cloud strategies, driven by the need to support diverse functionalities across different cloud platforms, facilitate partnerships, or integrate data from another organization, post-acquisition. This shift toward a multicloud environment underscores the importance of organizations implementing robust solutions like Delta Sharing to enable seamless and secure sharing both internally and externally. Implementing a cross-cloud strategy is often essential for our clients to maintain operational continuity, foster innovation, and drive growth in an interconnected digital ecosystem, while having the ability to leverage the unique strengths of each cloud service.

For many of our clients who adopt cross-cloud strategies, it’s clear that Delta Sharing’s open cross-platform sharing capabilities which seamlessly support multicloud environments are a clear differentiator and advantage. Delta Sharing is equally effective whether sharing data internally within a single cloud, or sharing data externally across multiple cloud platforms, ensuring a secure and efficient data exchange process for both scenarios.  Databricks has heard from many customers about their data sharing needs within multicloud environments and how Delta Sharing helps promote interoperability and enhance security across their cloud ecosystem.

One of these Databricks customers is Deutsche Börse, an international exchange organization and market infrastructure provider. Once they implemented Delta Sharing enabling them to openly share and collaborate with their customers, the business impact was transformative.

“Having a platform that allows secure data sharing with fine-grained access controls, the highest security standards, and privacy assurance opens up new possibilities. We can now engage in conversations on customized solutions where in the past, we would have said, ‘Unfortunately, our clients don’t want to share their data and models with us, or we don’t want to share more granular data or our models for confidentiality reasons.'”

— Jan Stiebing, head of Business Strategy and M&A at Deutsche Börse

In this customer example and in many others, Delta Sharing is able to bridge gaps for data sharing and collaboration that were once considered insurmountable, all while maintaining the highest standards of security and privacy. Deutsche Börse also offers several market data listings on Databricks Marketplace.

Network and Storage Configuration

Delta Sharing enables secure and seamless data sharing across various cloud environments, integrating seamlessly with the cloud’s native storage security architecture. It does so without needing to make significant modifications to your existing security framework. This approach is designed for organizations utilizing Databricks on cloud platforms such as Azure, AWS, and GCP, aligning with Unity Catalog’s requirements. The Databricks Data Intelligence Platform supports data sharing through cloud storage solutions (ADLS Gen2, S3, GCS) with an emphasis on private communication channels or IP address whitelisting for enhanced security.

The network and storage configuration for Delta Sharing outlined below works across both intra-cloud and cross-cloud scenarios. Intra-cloud sharing facilitates secure data exchange within the same cloud ecosystem using private endpoints, storage firewalls, and network gateways, ensuring no public access is allowed. In cross-cloud sharing scenarios, Delta Sharing leverages NAT gateway egress IPs and supports existing cross-cloud private connections, such as site-to-site VPNs or dedicated links to enable secure data access across different cloud platforms and on-premise networks. This comprehensive and secure approach allows for a wide range of network infrastructures to efficiently engage in Delta Sharing, promoting both flexibility and security.

Network and Storage Configuration

The above diagram represents a cross-cloud network configuration example.

Data Filtering

In Delta Sharing, data filtering is crucial for providing flexible and secure access, with two primary methods:

  • Partition Filtering: Enables sharing specific table partitions that align with recipient properties, known as parameterized partition sharing. This strategy allows data providers to share the needed data portions in a flexible manner, facilitating controlled access.
  • Dynamic Views: Enables sharing of any subset of data with recipients via dynamic functions such as current_recipient, offering fine-grained control over data access and improved manageability.

Allow access restrictions based on specific recipient properties, ensuring data is shared only with intended recipients and in the appropriate context. These approaches enhance Delta Sharing’s security and flexibility, allowing for tailored data access that meets unique recipient needs.

Security, Flexibility, and Seamless Integration with Delta Sharing

In conclusion, Delta Sharing is a key component of the Databricks Data Intelligence Platform and stands out for its secure, flexible, and cross-platform data sharing capabilities, supporting modern data strategies. In addition to supporting other platforms via open-source connectors, Delta Sharing enables customers to share structured and unstructured data, as well as AI models. All of these capabilities clearly differentiate Delta Sharing from other data exchange platforms. As a result, Delta Sharing is widely trusted by clients across different industries, reflected in customer testimonials, highlighting the significant impact on operational efficiency and innovation. As the data sharing landscape continues to evolve, Delta Sharing is built for the future, prioritizing security, flexibility, and seamless integration across diverse data sharing ecosystems. This steadfast commitment positions Delta Sharing as an indispensable asset in harnessing the power of data to advance the digital objectives of enterprises worldwide.

To learn more about how to implement Delta Sharing within your organization, check out the latest resources including new eBooks and related blogs below, or deep dive into the Delta Sharing documentation.

If you are already a Delta Sharing customer, you can also reach out to the team with questions or to provide feedback at [email protected].



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.