This document will cover detailed steps on how to query glue DB catalog from Dremio in a cross-account setup using AWS Lake formation
Use-case
Account A – Dremio is deployed here and AWS Glue_DB_A is created and added as a source in Dremio
Account B – AWS Glue_DB_B is created and data is located in the S3 bucket
Customer wants to share Glue-DB B catalog with Glue-DB A and query the data located in account B from Dremio
Setup Diagram
Role of each of service in the given setup –
-
Lake Formation – To create data mesh, simplify cross-account data sharing, and create resource links
-
Resource Access Manager – To share resources and view shared Data catalog
-
IAM User – To provide cross-account read/write access to the S3 bucket to run queries from Dremio
-
Amazon Athena – Just to test whether lake formation access is working fine or not
Steps
- Resource Sharing using Lake Formation and Resource Access Manager
First we need to use Lake Formation and Resource Access Manager to share glue catalog from account B to A
Steps for Account-B:
-
Create Glue DB named Glue_DB_B
-
Create Glue Table in this DB, point to S3 location where data resides, and provide schema
OR
You can use glue crawler to automatically extract data from S3 and add glue table for you. -
Go to Lake Formation console -> Data Lake Location -> Register same S3 location -> Use default IAM role ->
AWSServiceRoleForLakeFormationDataAccess
-
Go to Lake Formation -> Databases -> Select Glue_DB_B -> Actions -> Grant -> Fill in (External Account), put AWS Account-A ID -> Choose a specific table
For DB, grant Alter, Create table, Describe
For Table, grant Alter, Delete, Describe, Drop, Insert
- Go to Resource Access Manager console -> Shared by me in the left pane -> Resource Shares
You should be able to view your shared resources
Steps for Account-A:
-
Go to Resource Access Manager → Shared with me → Resource Shares → Accept your Resource Share
-
Now, Go to Lake Formation -> Table -> Your shared table will appear here -> Click on table -> Actions -> create Resource link
-
Table will now appear italicized in the glue db as shown below
Provide cross-account read/write access to the S3 bucket
Steps to do so:
- Go to Account B → S3 console
- Select your S3 bucket
- Go to the Permissions tab
- Edit Bucket Policy and add the following policy (make sure to add the AWS Account-A ID, IAM User name, and bucket name)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
},
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::<bucket-name>/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
},
"Action": [
"s3:GetLifecycleConfiguration",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::<bucket-name>"
}
]
}
- Add Glue catalog as a source in Dremio
Last step is to add Glue_DB_A as a source in Dremio :
- Go to Add Source
- Select AWS Glue Data Catalog
- Fill in the details – Name, Region, Authentication
- Hit Save
You should be able to view the datasets from both the glue catalogs and run queries on them.
Or
You can run the query on the glue source via Athena instead of Dremio.
Source link
lol