Glue cross-account setup

Glue cross-account setup


This document will cover detailed steps on how to query glue DB catalog from Dremio in a cross-account setup using AWS Lake formation

Use-case
Account A – Dremio is deployed here and AWS Glue_DB_A is created and added as a source in Dremio

Account B – AWS Glue_DB_B is created and data is located in the S3 bucket

Customer wants to share Glue-DB B catalog with Glue-DB A and query the data located in account B from Dremio

Setup Diagram

Role of each of service in the given setup

  • Lake Formation – To create data mesh, simplify cross-account data sharing, and create resource links

  • Resource Access Manager – To share resources and view shared Data catalog

  • IAM User – To provide cross-account read/write access to the S3 bucket to run queries from Dremio

  • Amazon Athena – Just to test whether lake formation access is working fine or not

Steps

  • Resource Sharing using Lake Formation and Resource Access Manager

First we need to use Lake Formation and Resource Access Manager to share glue catalog from account B to A

Steps for Account-B:

  1. Create Glue DB named Glue_DB_B

  2. Create Glue Table in this DB, point to S3 location where data resides, and provide schema
    OR
    You can use glue crawler to automatically extract data from S3 and add glue table for you.

  3. Go to Lake Formation console -> Data Lake Location -> Register same S3 location -> Use default IAM role -> AWSServiceRoleForLakeFormationDataAccess

  4. Go to Lake Formation -> Databases -> Select Glue_DB_B -> Actions -> Grant -> Fill in (External Account), put AWS Account-A ID -> Choose a specific table

For DB, grant Alter, Create table, Describe
For Table, grant Alter, Delete, Describe, Drop, Insert
Enter fullscreen mode

Exit fullscreen mode

  1. Go to Resource Access Manager console -> Shared by me in the left pane -> Resource Shares
    You should be able to view your shared resources

Steps for Account-A:

  1. Go to Resource Access Manager → Shared with me → Resource Shares → Accept your Resource Share

  2. Now, Go to Lake Formation -> Table -> Your shared table will appear here -> Click on table -> Actions -> create Resource link

  3. Table will now appear italicized in the glue db as shown below

Provide cross-account read/write access to the S3 bucket

Steps to do so:

  • Go to Account B → S3 console
  • Select your S3 bucket
  • Go to the Permissions tab
  • Edit Bucket Policy and add the following policy (make sure to add the AWS Account-A ID, IAM User name, and bucket name)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>/*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetLifecycleConfiguration",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>"
        }
    ]
}
Enter fullscreen mode

Exit fullscreen mode

  1. Add Glue catalog as a source in Dremio

Last step is to add Glue_DB_A as a source in Dremio :

  • Go to Add Source
  • Select AWS Glue Data Catalog
  • Fill in the details – Name, Region, Authentication
  • Hit Save

You should be able to view the datasets from both the glue catalogs and run queries on them.

Or

You can run the query on the glue source via Athena instead of Dremio.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.