Effectively manage foundation models for generative AI applications with Amazon SageMaker Model Registry

Generative artificial intelligence (AI) foundation models (FMs) are gaining popularity with businesses due to their versatility and potential to address a variety of use cases. The true value of FMs is realized when they are adapted for domain specific data. Managing these models across the business and model lifecycle can introduce complexity. As FMs are adapted to different domains and data, operationalizing these pipelines becomes critical.

Amazon SageMaker, a fully managed service to build, train, and deploy machine learning (ML) models, has seen increased adoption to customize and deploy FMs that power generative AI applications. SageMaker provides rich features to build automated workflows for deploying models at scale. One of the key features that enables operational excellence around model management is the Model Registry. Model Registry helps catalog and manage model versions and facilitates collaboration and governance. When a model is trained and evaluated for performance, it can be stored in the Model Registry for model management.

Amazon SageMaker has released new features in Model Registry that make it easy to version and catalog FMs. Customers can use SageMaker to train or tune FMs, including Amazon SageMaker JumpStart and Amazon Bedrock models, and also manage these models within Model Registry. As customers begin to scale generative AI applications across various use cases such as fine-tuning for domain-specific tasks, the number of models can quickly grow. To keep track of models, versions, and associated metadata, SageMaker Model Registry can be used as an inventory of models.

In this post, we explore the new features of Model Registry that streamline FM management: you can now register unzipped model artifacts and pass an End User License Agreement (EULA) acceptance flag without needing users to intervene.

Overview

Model Registry has worked well for traditional models, which are smaller in size. For FMs, there were challenges because of their size and requirements for user intervention for EULA acceptance. With the new features in Model Registry, it’s become easier to register a fine-tuned FM within Model Registry, which then can be deployed for actual use.

A typical model development lifecycle is an iterative process. We conduct many experimentation cycles to achieve expected performance from the model. Once trained, these models can be registered in the Model Registry where they are cataloged as versions. The models can be organized in groups, the versions can be compared for their quality metrics, and models can have an associated approval status indicating if its deployable.

Once the model is manually approved, a continuous integration and continuous deployment (CI/CD) pipeline can be triggered to deploy these models to production. Optionally, Model Registry can be used as a repository of models that are approved for use by an enterprise. Various teams can then deploy these approved models from Model Registry and build applications around it.

An example workflow could follow these steps and is shown in the following diagram:

Select a SageMaker JumpStart model and register it in Model Registry
Alternatively, you can fine-tune a SageMaker JumpStart model
Evaluate the model with SageMaker model evaluation. SageMaker allows for human evaluation if desired.
Create a model group in the Model Registry. For each run, create a model version. Add your model group into one or more Model Registry Collections, which can be used to group registered models that are related to each other. For example, you could have a collection of large language models (LLMs) and another collection of diffusion models.
Deploy the models as SageMaker Inference endpoints that can be consumed by generative AI applications.

Figure 1: Model Registry workflow for foundation models

To better support generative AI applications, Model Registry released two new features: ModelDataSource, and source model URI. The following sections will explore these features and how to use them.

ModelDataSource speeds up deployment and provides access to EULA dependent models

Until now, model artifacts had to be stored along with the inference code when a model gets registered in Model Registry in a compressed format. This posed challenges for generative AI applications where FMs are of very large size with billions of parameters. The large size of FMs when stored as zipped models was causing increased latency with SageMaker endpoint startup time because decompressing these models at run time took very long. The model_data_source parameter can now accept the location of the unzipped model artifacts in Amazon Simple Storage Service (Amazon S3) making the registration process simple. This also eliminates the need for endpoints to unzip the model weights, leading to reduced latency during endpoint startup times.

Additionally, public JumpStart models and certain FMs from independent service providers, such as LLAMA2, require that their EULA must be accepted prior to using the models. Thus, when public models from SageMaker JumpStart were tuned, they could not be stored in the Model Registry because a user needed to accept the license agreement. Model Registry added a new feature: EULA acceptance flag support within the model_data_source parameter, allowing the registration of such models. Now customers can catalog, version, associate metadata such as training metrics, and more in Model Registry for a wider variety of FMs.

model_data_source = {
               "S3DataSource": {
                      "S3Uri": "s3://bucket/model/prefix/", 
                      "S3DataType": "S3Prefix",          
                      "CompressionType": "None",            
                      "ModelAccessConfig": {                 
                           "AcceptEula": true
                       },
                 }
}
model = Model(       
               sagemaker_session=sagemaker_session,        
               image_uri=IMAGE_URI,      
               model_data=model_data_source
)
model.register()

from sagemaker.jumpstart.model importJumpStartModel
model_id = "meta-textgeneration-llama-2-7b"
my_model = JumpStartModel(model_id=model_id)
registered_model =my_model.register(accept_eula=True)
predictor = registered_model.deploy()

Source model URI provides simplified registration and proprietary model support

Model Registry now supports automatic population of inference specification files for some recognized model IDs, including select AWS Marketplace models, hosted models, or versioned model packages in Model Registry. Because of SourceModelURI’s support for automatic population, you can register proprietary JumpStart models from providers such as AI21 labs, Cohere, and LightOn without needing the inference specification file, allowing your organization to use a broader set of FMs in Model Registry.

Previously, to register a trained model in the SageMaker Model Registry, you had to provide the complete inference specification required for deployment, including an Amazon Elastic Container Registry (Amazon ECR) image and the trained model file. With the launch of source_uri support, SageMaker has made it easy for users to register any model by providing a source model URI, which is a free form field that stores model ID or location to a proprietary JumpStart and Bedrock model ID, S3 location, and MLflow model ID. Rather than having to supply the details required for deploying to SageMaker hosting at the time of registrations, you can add the artifacts later on. After registration, to deploy a model, you can package the model an inference specification and update Model Registry accordingly.

For example, you can register a model in Model Registry with a model Amazon Resource Name (ARN) SourceURI.

model_arn = "<arn of the model to be registered>"
registered_model_package = model.register(        
        model_package_group_name="model_group_name",
        source_uri=model_arn
)

Later, you can update the registered model with the inference specification, making it deployable on SageMaker.

model_package = sagemaker_session.sagemaker_client.create_model_package( 
        ModelPackageGroupName="model_group_name", 
        SourceUri="source_uri"
)
mp = ModelPackage(        
       role=get_execution_role(sagemaker_session),
       model_package_arn=model_package["ModelPackageArn"],
       sagemaker_session=sagemaker_session
)
mp.update_inference_specification(image_uris=["ecr_image_uri"])

from sagemaker.jumpstart.model import JumpStartModel
model_id = "ai21-contextual-answers"
my_model = JumpStartModel(
           model_id=model_id
)
model_package = my_model.register()

Conclusion

As organizations continue to adopt generative AI in different parts of their business, having robust model management and versioning becomes paramount. With Model Registry, you can achieve version control, tracking, collaboration, lifecycle management, and governance of FMs.

In this post, we explored how Model Registry can now more effectively support managing generative AI models across the model lifecycle, empowering you to better govern and adopt generative AI to achieve transformational outcomes.

To learn more about Model Registry, see Register and Deploy Models with Model Registry. To get started, visit the SageMaker console.

About the Authors

Chaitra Mathur serves as a Principal Solutions Architect at AWS, where her role involves advising clients on building robust, scalable, and secure solutions on AWS. With a keen interest in data and ML, she assists clients in leveraging AWS AI/ML and generative AI services to address their ML requirements effectively. Throughout her career, she has shared her expertise at numerous conferences and has authored several blog posts in the ML area.

Kait Healy is a Solutions Architect II at AWS. She specializes in working with startups and enterprise automotive customers, where she has experience building AI/ML solutions at scale to drive key business outcomes.

Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies