Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services | Amazon Web Services

Serverless custom NLP with LLMs, Modal and Prodigy · Explosion


In ecommerce, visual search technology revolutionizes how customers find products by enabling them to search for products using images instead of text. Shoppers often have a clear visual idea of what they want but struggle to describe it in words, leading to inefficient and broad text-based search results. For example, searching for a specific red leather handbag with a gold chain using text alone can be cumbersome and imprecise, often yielding results that don’t directly match the user’s intent. By using images, visual search can directly match physical attributes, providing better results quickly and enhancing the overall shopping experience.

A reverse image search engine enables users to upload an image to find related information instead of using text-based queries. It works by analyzing the visual content to find similar images in its database. Companies such as Amazon use this technology to allow users to use a photo or other image to search for similar products on their ecommerce websites. Other companies use it to identify objects, faces, and landmarks to discover the original source of an image. Beyond ecommerce, reverse image search engines are invaluable to law enforcement for identifying illegal items for sale and identifying suspects, to publishers for validating visual content authenticity, for healthcare professionals by assisting in medical image analysis, and tackling challenges such as misinformation, copyright infringement, and counterfeit products.

In the context of generative AI, significant progress has been made in developing multimodal embedding models that can embed various data modalities—such as text, image, video, and audio data—into a shared vector space. By mapping image pixels to vector embeddings, these models can analyze and compare visual attributes such as color, shape, and size, enabling users to find similar images with specific attributes, leading to more precise and relevant search results.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. The Amazon Bedrock single API access, regardless of the models you choose, gives you the flexibility to use different FMs and upgrade to the latest model versions with minimal code changes.

Exclusive to Amazon Bedrock, the Amazon Titan family of models incorporates 25 years of experience innovating with AI and machine learning at Amazon. Amazon Titan FMs provide customers with a breadth of high-performing image, multimodal, and text model choices, through a fully managed API. With Amazon Titan Multimodal Embeddings, you can power more accurate and contextually relevant multimodal search, recommendation, and personalization experiences for users.

In this post, you will learn how to extract key objects from image queries using Amazon Rekognition and build a reverse image search engine using Amazon Titan Multimodal Embeddings from Amazon Bedrock in combination with Amazon OpenSearch Serverless Service.

Solution overview

The solution outlines how to build a reverse image search engine to retrieve similar images based on input image queries. This post demonstrates a guide for using Amazon Titan Multimodal Embeddings to embed images, store these embeddings in an OpenSearch Serverless vector index, and use Amazon Rekognition to extract key objects from images for querying the index.

The following diagram illustrates the solution architecture:

The steps of the solution include:

  1. Upload data to Amazon S3: Store the product images in Amazon Simple Storage Service (Amazon S3).
  2. Generate embeddings: Use Amazon Titan Multimodal Embeddings to generate embeddings for the stored images.
  3. Store embeddings: Ingest the generated embeddings into an OpenSearch Serverless vector index, which serves as the vector database for the solution.
  4. Image analysis: Use Amazon Rekognition to analyze the product images and extract labels and bounding boxes for these images. These extracted objects will then be saved as separate images, which can be used for the query.
  5. Convert search query to an embedding: Convert the user’s image search query into an embedding using Amazon Titan Multimodal Embeddings.
  6. Run similarity search: Perform a similarity search on the vector database to find product images that closely match the search query embedding.
  7. Display results: Display the top K similar results to the user.

Prerequisites

To implement the proposed solution, make sure that you have the following:

Request model access

  • An Amazon SageMaker Studio domain. If you haven’t set up a SageMaker Studio domain, see this Amazon SageMaker blog post for instructions on setting up SageMaker Studio for individual users.
  • An Amazon OpenSearch Serverless collection. You can create a vector search collection by following the steps in Create a collection with public network access and data access granted to the Amazon SageMaker Notebook execution role principal.
  • The GitHub repo cloned to the Amazon SageMaker Studio instance. To clone the repo onto your SageMaker Studio instance, choose the Git icon on the left sidebar and enter https://github.com/aws-samples/reverse-image-search-engine.git
  • After it has cloned, you can navigate to the reverse-image-search-engine.ipynb notebook file to and run the cells. This post highlights the important code segments; however, the full code can be found in the notebook.
  • The necessary permissions attached to the Amazon SageMaker notebook execution role to grant read and write access to the Amazon OpenSearch Serverless collection. For more information on managing credentials securely, see the AWS Boto3 documentation. Make sure that full access is granted to the SageMaker execution role by applying the following IAM policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "aoss:*",
            "Resource": "*"
        }
    ]
}

Upload the dataset to Amazon S3

In this solution, we will use the Shoe Dataset from Kaggle.com, which contains a collection of approximately 1,800 shoe images. The dataset is primarily used for image classification use cases and contains images of shoes from six main categories—boots, sneakers, flip flops, loafers, sandals, and soccer shoes—with 249 JPEG images for each shoe type. For this tutorial, you will concentrate on the loafers folder found in the training category folder.

To upload the dataset

  1. Download the dataset: Go to the Shoe Dataset page on Kaggle.com and download the dataset file (350.79MB) that contains the images.
  2. Extract the specific folder: Extract the downloaded file and navigate to the loafers category within the training 
  3. Create an Amazon S3 bucket: Sign in to the Amazon S3 console, choose Create bucket, and follow the prompts to create a new S3 bucket.
  4. Upload images to the Amazon S3 bucket using the AWS CLI: Open your terminal or command prompt and run the following command to upload the images from the loafers folder to the S3 bucket:
    aws s3 cp </path/to/local/folder> s3://<your-bucket-name>/ --recursive

Replace </path/to/local/folder> with the path to the loafers category folder from the training folder on your local machine. Replace <your-bucket-name> with the name of your S3 bucket. For example:
aws s3 cp /Users/username/Documents/training/loafers s3://footwear-dataset/ --recursive

  1. Confirm the upload: Go back to the S3 console, open your bucket, and verify that the images have been successfully uploaded to the bucket.

Create image embeddings

Vector embeddings represent information—such as text or images—as a list of numbers, with each number capturing specific features. For example, in a sentence, some numbers might represent the presence of certain words or topics, while in an image or video, they might represent colors, shapes, or patterns. This numerical representation, or vector, is placed in a multidimensional space called the embedding space, where distances between vectors indicate similarities between the represented information. The closer vectors are to one another in this space, the more similar the information they represent is. The following figure is an example of an image and part of its associated vector.

Example of image embedding

To convert images to vectors, you can use Amazon Titan Multimodal Embeddings to generate image embeddings, which can be accessed through Amazon Bedrock. The model will generate vectors embeddings with 1,024 dimensions; however, you can choose a smaller dimension size to optimize for speed and performance.

To create image embeddings:

  1. The following code segment shows how to create a function that will be used to generate embeddings for the dataset of shoe images stored in the S3 bucket.
    # Import required libraries
    import boto3
    import pandas as pd
    import base64
    import json
    
    # Constants, change to your S3 bucket name and selected AWS region
    BUCKET_NAME = "<YOUR_AMAZON_S3_BUCKET_NAME>"
    BEDROCK_MODEL_ID = "amazon.titan-embed-image-v1"
    REGION = "<YOUR_SELECTED_AWS_REGION>"
    # Define max width and height for resizing to accommodate Bedrock limits
    MAX_WIDTH = 1024  
    MAX_HEIGHT = 1024  
    
    # Initialize AWS clients
    s3 = boto3.client('s3')
    bedrock_client = boto3.client(
        "bedrock-runtime", 
        REGION, 
        endpoint_url=f"https://bedrock-runtime.{REGION}.amazonaws.com"
    )
    
    # Function to resize image
    def resize_image(image_data):
        image = Image.open(io.BytesIO(image_data))
    
        # Resize image while maintaining aspect ratio
        image.thumbnail((MAX_WIDTH, MAX_HEIGHT))
    
        # Save resized image to bytes buffer
        buffer = io.BytesIO()
        image.save(buffer, format="JPEG")
        buffer.seek(0)
    
        return buffer.read()
    
    # Function to create embedding from input image
    def create_image_embedding(image):
        image_input = {}
    
        if image is not None:
            image_input["inputImage"] = image
        else:
            raise ValueError("Image input is required")
    
        image_body = json.dumps(image_input)
    
        # Invoke Amazon Bedrock with encoded image body
        bedrock_response = bedrock_client.invoke_model(
            body=image_body,
            modelId=BEDROCK_MODEL_ID,
            accept="application/json",
            contentType="application/json"
        )
    
        # Retrieve body in JSON response
        final_response = json.loads(bedrock_response.get("body").read())
    
        embedding_error = final_response.get("message")
    
        if embedding_error is not None:
            print (f"Error creating embeddings: {embedding_error}")
    
        # Return embedding value
        return final_response.get("embedding")

  2. Because you will be performing a search for similar images stored in the S3 bucket, you will also have to store the image file name as metadata for its embedding. Also, because the model expects a base64 encoded image as input, you will have to create an encoded version of the image for the embedding function. You can use the following code to fulfill both requirements.
    # Retrieve images stored in S3 bucket 
    response = s3.list_objects_v2(Bucket=BUCKET_NAME)
    contents = response.get('Contents', [])
    
    # Define arrays to hold embeddings and image file key names
    image_embeddings = []
    image_file_names = []
    
    # Loop through S3 bucket to encode each image, generate its embedding, and append to array
    for obj in contents:
        image_data = s3.get_object(Bucket=BUCKET_NAME, Key=obj['Key'])['Body'].read()
    
        # Resize the image to meet model requirements
        resized_image = resize_image(image_data)
    
        # Create base64 encoded image for Titan Multimodal Embeddings model input
        base64_encoded_image = base64.b64encode(resized_image).decode('utf-8')
    
        # Generate the embedding for the resized image
        image_embedding = create_image_embedding(image=base64_encoded_image)
        image_embeddings.append(image_embedding)
        image_file_names.append(obj["Key"])

  3. After generating embeddings for each image stored in the S3 bucket, the resulting embedding list can be obtained by running the following code
    # Add and list embeddings with associated image file key to dataframe object
    final_embeddings_dataset = pd.DataFrame({'image_key': image_file_names, 'image_embedding': image_embeddings})
    final_embeddings_dataset.head()

image_key image_embedding
image1.jpeg [0.00961759, 0.0016261627, -0.0024508594, -0.0…
image10.jpeg [0.008917685, -0.0013863152, -0.014576114, 0.0…
image100.jpeg [0.006402869, 0.012893448, -0.0053941975, -0.0…
image101.jpg [0.06542923, 0.021960363, -0.030726435, -0.000…
image102.jpeg [0.0134112835, -0.010299515, -0.0044046864, -0…

Upload embeddings to Amazon OpenSearch Serverless

Now that you have created embeddings for your images, you need to store these vectors so they can be searched and retrieved efficiently. To do so, you can use a vector database.

A vector database is a type of database designed to store and retrieve vector embeddings. Each data point in the database is associated with a vector that encapsulates its attributes or features. This makes it particularly useful for tasks such as similarity search, where the goal is to find objects that are the most similar to a given query object. To search against the database, you can use a vector search, which is performed using the k-nearest neighbors (k-NN) algorithm. When you perform a search, the algorithm computes a similarity score between the query vector and the vectors of stored objects using methods such as cosine similarity or Euclidean distance. This enables the database to retrieve the closest objects that are most similar to the query object in terms of their features or attributes. Vector databases often use specialized vector search engines, such as nmslib or faiss, which are optimized for efficient storage, retrieval, and similarity calculation of vectors.

In this post, you will use OpenSearch Serverless as the vector database for the image embeddings. OpenSearch Serverless is a serverless option for OpenSearch Service, a powerful storage option built for distributed search and analytics use cases. With Amazon OpenSearch Serverless, you don’t need to provision, configure, and tune the instance clusters that store and index your data.

To upload embeddings:

  1. If you have set up your Amazon OpenSearch Serverless collection, the next step is to create a vector index. In the Amazon OpenSearch Service console, choose Serverless Collections, then select your collection.
  2. Choose Create vector index.

Create vector index in OpenSearch Collection

  1. Next, create a vector field by entering a name, defining an engine, and adding the dimensions, and search configurations.
    1. Vector field name: Enter a name, such as vector.
    2. Engine: Select nmslib.
    3. Dimensions: Enter 1024.
    4. Distance metric: Select Euclidean.
    5. Choose Confirm.

  1. To tag each embedding with the image file name, you must also add a mapping field under Metadata management.
    1. Mapping field: Enter image_file.
    2. Data type: Select String.
    3. Filterable: Select True.
    4. Choose Create to create the index.

Review and confirm vector index creation

  1. Now that the vector index has been created, you can ingest the embeddings. To do so, run the following code segment to connect to your Amazon OpenSearch Serverless collection.
# Import required libraries to connect to Amazon OpenSearch Serverless connection
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

# Initialize endpoint name constant
HOST = "<YOUR_HOST_ENDPOINT_NAME>" # For example, abcdefghi.us-east-1.aoss.amazonaws.com (without https://)

# Initialize and authenticate with the OpenSearch client
credentials = boto3.Session().get_credentials()
auth = AWS4Auth(credentials.access_key, credentials.secret_key, REGION, 'aoss', session_token=credentials.token)
client = OpenSearch(
    hosts=[{'host': HOST, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=300
)

  1. After connecting, you can ingest your embeddings and the associated image key for each vector as shown in the following code.
# Import required library to iterate through dataset
import tqdm.notebook as tq

INDEX_NAME = "<YOUR_VECTOR_INDEX_NAME>"
VECTOR_NAME = "<YOUR_VECTOR_FIELD_NAME>"
VECTOR_MAPPING = "<YOUR_MAPPING_FIELD_NAME>"

# Ingest embeddings into vector index with associate vector and text mapping fields
for idx, record in tq.tqdm(final_embeddings_dataset.iterrows(), total=len(final_embeddings_dataset)):
    body = {
        VECTOR_NAME: record['image_embedding'],
        VECTOR_MAPPING: record['image_key']
    }
    response = client.index(index=INDEX_NAME, body=body)

Use Amazon Rekognition to extract key objects

Now that the embeddings have been created, use Amazon Rekognition to extract objects of interest from your search query. Amazon Rekognition analyzes images to identify objects, people, text, and scenes by detecting labels and generating bounding boxes. In this use case, Amazon Rekognition will be used to detect shoe labels in query images.

To view the bounding boxes around your respective images, run the following code. If you want to apply this to your own sample images, make sure to specify the labels you want to identify. Upon completion of the bounding box and label generation, the extracted objects will be saved in your local directory in the SageMaker Notebook environment.

# Import required libraries to draw bounding box on image
from PIL import Image, ImageDraw, ImageFont

# Function to draw bounding boxes and extract labeled objects
def process_image(image_path, boxes, labels):
    # Load the image
    image = Image.open(image_path)
    
    # Convert RGBA to RGB if necessary
    if image.mode == 'RGBA':
        image = image.convert('RGB')
    
    draw = ImageDraw.Draw(image)
    
    # Font for the label
    try:
        font = ImageFont.truetype("arial.ttf", 15)
    except IOError:
        font = ImageFont.load_default()

    # Counter for unique filenames
    crop_count = 1 
    
    # Draw bounding boxes around specific label of interest (ex. shoe) and extract labeled objects
    for box, label in zip(boxes, labels):
    
        # Change to specific label you are looking to extract
        if label not in "Shoe":
            continue
        
        # Box coordinates
        left = int(image.width * box['Left'])
        top = int(image.height * box['Top'])
        right = left + int(image.width * box['Width'])
        bottom = top + int(image.height * box['Height'])
            
        # Crop the image to the bounding box
        cropped_image = image.crop((left, top, right, bottom))
    
        # Draw label on the cropped image
        cropped_draw = ImageDraw.Draw(cropped_image)
    
        # File name for the output
        file_name = f"extract_{crop_count}.jpg"
        # Save extracted object image locally
        cropped_image.save(file_name)
        print(f"Saved extracted object image: {file_name}")
        crop_count += 1
    
    # Save or display the image with bounding boxes
    image.show()

The following image shows the outputted image with the respective labels within the bounding boxes:

Embed object image

Now that the object of interest within the image has been extracted, you need to generate an embedding for it so that it can be searched against the stored vectors in the Amazon OpenSearch Serverless index. To do so, find the best extracted image in the local directory created when the images were downloaded. Ensure the image is unobstructed, high-quality, and effectively encapsulates the features that you’re searching for. After you have identified the best image, paste its file name as shown in the following code.

# Open the extracted object image file in binary mode
# Paste your extracted image from the local download directory in the notebook below
with open("<YOUR_LOCAL_EXTRACTED_IMAGE (ex. extract_1.jpg)>", "rb") as image_file:
    base64_encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

# Embed the extracted object image
object_embedding = create_image_embedding(image=base64_encoded_image)

# Print the first few numbers of the embedding followed by ...
print(f"Image embedding: {object_embedding[:5]} ...")

Perform a reverse image search

With the embedding of the extracted object, you can now perform a search against the Amazon OpenSearch Serverless vector index to retrieve the closest matching images, which is performed using the k-NN algorithm. When you created your vector index earlier, you defined the similarity between vector distances to be calculated using the Euclidian metric with the nmslib engine. With this configuration, you can define the number of results to retrieve from the index and invoke the Amazon OpenSearch Service client with a search request as shown in the following code.

# Define number of images to search and retrieve
K_SEARCHES = 3

# Define search configuration body for K-NN 
body = {
        "size": K_SEARCHES,
        "_source": {
            "exclude": [VECTOR_NAME],
        },
        "query": {
            "knn": {
                "vectors": {
                    "vector": object_embedding,
                    "k": K_SEARCHES,
                }
            }
        },
        "_source": True,
        "fields": [VECTOR_MAPPING],
    }

# Invoke OpenSearch to search through index with K-NN configurations
knn_response = client.search(index=INDEX_NAME, body=body)
result = []
scores_tracked = set()  # Set to keep track of already retrieved images and their scores

# Loop through response to print the closest matching results
for hit in knn_response["hits"]["hits"]:
    id_ = hit["_id"]
    score = hit["_score"]
    item_id_ = hit["_source"][VECTOR_MAPPING]

    # Check if score has already been tracked, if not, add it to final result
    if score not in scores_tracked:
        final_item = [item_id_, score]
        result.append(final_item)
        scores_tracked.add(score)  # Log score as tracked already

# Print Top K closest matches
print(f"Top {K_SEARCHES} closest embeddings and associated scores: {result}")

Because the preceding search retrieves the file names that are associated with the closest matching vectors, the next step is to fetch each specific image to display the results. This can be accomplished by downloading the specific image from the S3 bucket to a local directory in the notebook, then displaying each one sequentially. Note that if your images are stored within a subdirectory in the bucket, you might need to add the appropriate prefix to the bucket path as shown in the following code.

import os

# Function to display image
def display_image(image_path):
    image = Image.open(image_path)
    image.show()
    
# List of image file names from the K-NN search
image_files = result

# Create a local directory to store downloaded images
download_dir="RESULTS"

# Create directory if not exists
os.makedirs(download_dir, exist_ok=True)

# Download and display each image that matches image query
for file_name in image_files:
    print("File Name: " + file_name[0])
    print("Score: " + str(file_name[1]))
    local_path = os.path.join(download_dir, file_name[0])
    # Ensure to add in the necessary prefix before the file name if files are in subdirectories in the bucket
    # ex. s3.download_file(BUCKET_NAME, "training/loafers/"+file_name[0], local_path)
    s3.download_file(BUCKET_NAME, file_name[0], local_path)
    # Open downloaded image and display it
    display_image(local_path)
    print()

The following images show the results for the closest matching products in the S3 bucket related to the extracted object image query:

First match:
File Name: image17.jpeg
Score: 0.64478767
Image of first match from search

Second match:
File Name: image209.jpeg
Score: 0.64304984
Image of second match from search

Third match:
File Name: image175.jpeg
Score: 0.63810235
Image of third match from search

Clean up

To avoid incurring future charges, delete the resources used in this solution.

  1. Delete the Amazon OpenSearch Collection vector index.
  2. Delete the Amazon OpenSearch Serverless collection.
  3. Delete the Amazon SageMaker resources.
  4. Empty and delete the Amazon S3 bucket.

Conclusion

By combining the power of Amazon Rekognition for object detection and extraction, Amazon Titan Multimodal Embeddings for generating vector representations, and Amazon OpenSearch Serverless for efficient vector indexing and search capabilities, you successfully created a robust reverse image search engine. This solution enhances product recommendations by providing precise and relevant results based on visual queries, thereby significantly improving the user experience for ecommerce solutions.

For more information, see the following resources:


About the Authors

Nathan Pogue is a Solutions Architect on the Canadian Public Sector Healthcare and Life Sciences team at AWS. Based in Toronto, he focuses on empowering his customers to expand their understanding of AWS and utilize the cloud for innovative use cases. He is particularly passionate about AI/ML and enjoys building proof-of-concept solutions for his customers.

Waleed Malik is a Solutions Architect with the Canadian Public Sector EdTech team at AWS. He holds six AWS certifications, including the Machine Learning Specialty Certification. Waleed is passionate about helping customers deepen their knowledge of AWS by translating their business challenges into technical solutions.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.