Answer questions from tables embedded in documents with Amazon Q Business | Amazon Web Services

Answer questions from tables embedded in documents with Amazon Q Business | Amazon Web Services


Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. A large portion of that information is found in text narratives stored in various document formats such as PDFs, Word files, and HTML pages. Some information is also stored in tables (such as price or product specification tables) embedded in those same document types, CSVs, or spreadsheets. Although Amazon Q Business can provide accurate answers from narrative text, getting answers from these tables requires special handling of more structured information.

On November 21, 2024, Amazon Q Business launched support for tabular search, which you can use to extract answers from tables embedded in documents ingested in Amazon Q Business. Tabular search is a built-in feature in Amazon Q Business that works seamlessly across many domains, with no setup required from admin or end users.

In this post, we ingest different types of documents that have tables and show you how Amazon Q Business responds to questions related to the data in the tables.

Prerequisites

To follow along with this walkthrough, you need to have the following prerequisites in place:

  • An AWS Account where you can follow the instructions in this post.
  • At least one Amazon Q Business user is required. For information, refer to Amazon Q Business pricing.
  • Requires cross-Region inference enabled on the Amazon Q application.
  • Amazon Q Business applications created on or after November 21, 2024, will automatically benefit from the new capability. If your application was created before this date, you are required to reingest your content to update their indexes.

Overview of tabular search

Tabular search extends Amazon Q Business capabilities to find answers beyond text paragraphs, analyzing tables embedded in enterprise documents so you can get answers to a wide range of queries, including factual lookup from tables.

With tabular search in Amazon Q Business, you can ask questions such as, “what’s the credit card with the lowest APR and no annual fees?” or “which credit cards offer travel insurance?” where the answers may be found in a product-comparison table, inside a marketing PDF stored in an internal repository, or on a website.

This feature supports a wide range of file formats, including PDF, Word documents, CSV files, Excel spreadsheets, HTML, and SmartSheet (via SmartSheet connector). Notably, tabular search can also extract data from tables represented as images within PDFs and retrieve information from single or multiple cells. Additionally, it can perform aggregations on numerical data, providing users with valuable insights.

Ingest documents in Amazon Q Business

To create an Amazon Q Business application, retriever, and index to pull data in real time during a conversation, follow the steps under the Create and configure your Amazon Q application section in the AWS Machine Learning Blog post, Discover insights from Amazon S3 with Amazon Q S3 connector.

For this post, we use The World’s Billionaires, which lists the world’s top 10 billionaires from 1987 through 2024 in a tabular format. You can download this data as a PDF from Wikipedia using the Tools menu. Upload the PDF to an Amazon Simple Storage Service (Amazon S3) bucket and use it as a data source in your Amazon Q Business application.

Run queries with Amazon Q

You can start asking questions to Amazon Q using the Web experience URL, which can be found on the Applications page, as shown in the following screenshot.

Suppose we want to know the ratio of men to women who appeared on the Forbes 2024 list of the world’s billionaires. As you can tell from the following screenshot of The World’s Billionaires PDF, there were 383 women and 2398 men.

To use Amazon Q Business to elicit that information from the PDF, enter the following in the web experience chatbot

“In 2024, what is the ratio of men to women who appeared in the Forbes 2024 billionaire’s list?”

Amazon Q Business supplies the answer, as shown in the following screenshot.

The following screenshot is a list of the top 10 Billionaires from 2009.

We enter “How many of the top 10 billionaires in 2009 were from countries outside the United States?”

Amazon Q Business provides an answer, as shown in the following screenshot.

Next, to demonstrate how Amazon Q Business can pull data from a CSV file, we used the example of crime statistics found here.

We enter the question, “How many incidents of crime were reported in Hollywood?”

Amazon Q Business provides the answer, as shown in the following screenshot.

Metadata boosting

To improve the accuracy of responses from Amazon Q Business application with CSV files, you can add metadata to documents in an S3 bucket by using a metadata file. Metadata is additional information about a document describing it further in order to improve retrieval accuracy for context-poor document formats for example, a CSV with cryptic column names. Additional fields such as its title and the date and time it was created can also be useful if you want to search the titles or want documents from certain time period.

You can do this by following Enable document attributes for search in Amazon Q Business.

Additional details about metadata boosting can be found at Configuring document attributes for boosting in Amazon Q Business in the Amazon Q User Guide.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the Amazon Q application, data sources, and corresponding IAM roles.

To delete the Amazon Q application, follow these steps:

  1. On the Amazon Q console, choose Applications and then select your application.
  2. On the Actions drop-down menu, choose Delete.
  3. To confirm deletion, enter delete in the field and choose Delete. Wait until you get the confirmation message; the process can take up to 15 minutes.

To delete the S3 bucket created in Prepare your S3 bucket as a data source, follow these steps:

  1. Follow the instructions in Emptying a bucket
  2. Follow the steps in Deleting a bucket

To delete the IAM Identity center instance you created as part of the prerequisites, follow the steps at Delete your IAM Identity Center instance.

Conclusion

By following this post, you can ingest different types of documents that contain tables in them. Then, you can ask Amazon Q questions related to information in the table and have Amazon Q provide you answers in natural language.

To learn about metadata search, refer to Configuring metadata controls in Amazon Q Business.

For S3 data source setup refer to Set up Amazon Q Business application with S3 data source.


About the author

jdJiten Dedhia is a Sr. AIML Solutions Architect with over 20 years of experience in the software industry. He has helped Fortune 500 companies with their AIML/Generative AI needs.

smSapna Maheshwari is a Sr. Solutions Architect at AWS, with a passion for designing impactful tech solutions. She is an engaging speaker who enjoys sharing her insights at conferences.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.