Altogether: Image Captioning via Re-aligning Alt-text

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Altogether: Image Captioning via Re-aligning Alt-text, by Hu Xu and 12 other authors

View PDF
HTML (experimental)

Abstract:This paper focuses on creating synthetic data to improve the quality of image captions. Existing works typically have two shortcomings. First, they caption images from scratch, ignoring existing alt-text metadata, and second, lack transparency if the captioners’ training data (e.g. GPT) is unknown. In this paper, we study a principled approach Altogether based on the key idea to edit and re-align existing alt-texts associated with the images. To generate training data, we perform human annotation where annotators start with the existing alt-text and re-align it to the image content in multiple rounds, consequently constructing captions with rich visual concepts. This differs from prior work that carries out human annotation as a one-time description task solely based on images and annotator knowledge. We train a captioner on this data that generalizes the process of re-aligning alt-texts at scale. Our results show our Altogether approach leads to richer image captions that also improve text-to-image generation and zero-shot image classification tasks.

Submission history

From: Hu Xu [view email]
[v1]
Tue, 22 Oct 2024 17:59:57 UTC (22,828 KB)
[v2]
Thu, 12 Dec 2024 18:26:45 UTC (22,828 KB)
[v3]
Sat, 28 Dec 2024 17:46:27 UTC (22,829 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.