NARAIM: Native Aspect Ratio Autoregressive Image Models

stp2yDecember 6, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 13 Oct 2024 (v1), last revised 4 Dec 2024 (this version, v2)]

View a PDF of the paper titled NARAIM: Native Aspect Ratio Autoregressive Image Models, by Daniel Gallo Fern’andez and 6 other authors

View PDF
HTML (experimental)

Abstract:While vision transformers are able to solve a wide variety of computer vision tasks, no pre-training method has yet demonstrated the same scaling laws as observed in language models. Autoregressive models show promising results, but are commonly trained on images that are cropped or transformed into square images, which distorts or destroys information present in the input. To overcome this limitation, we propose NARAIM, a vision model pre-trained with an autoregressive objective that uses images in their native aspect ratio. By maintaining the native aspect ratio, we preserve the original spatial context, thereby enhancing the model’s ability to interpret visual information. In our experiments, we show that maintaining the aspect ratio improves performance on a downstream classification task.

Submission history

From: Daniel Gallo Fernández [view email]
[v1]
Sun, 13 Oct 2024 21:13:48 UTC (3,793 KB)
[v2]
Wed, 4 Dec 2024 22:21:36 UTC (3,788 KB)

Source link
lol

By stp2y