Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

stp2yNovember 22, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 4 Jun 2023 (v1), last revised 21 Nov 2024 (this version, v3)]

View a PDF of the paper titled Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification, by Jintao Rong and 5 other authors

View PDF
HTML (experimental)

Abstract:The Contrastive Language-Image Pretraining (CLIP) model has been widely used in various downstream vision tasks. The few-shot learning paradigm has been widely adopted to augment its capacity for these tasks. However, current paradigms may struggle with fine-grained classification, such as satellite image recognition, due to widening domain gaps. To address this limitation, we propose retrieval-enhanced visual prompt learning (RePrompt), which introduces retrieval mechanisms to cache and reuse the knowledge of downstream tasks. RePrompt constructs a retrieval database from either training examples or external data if available, and uses a retrieval mechanism to enhance multiple stages of a simple prompt learning baseline, thus narrowing the domain gap. During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions. A detailed analysis reveals that retrieval helps to improve the distribution of late features, thus, improving generalization for downstream tasks. Reprompt attains state-of-the-art performance on a wide range of vision datasets, including 11 image datasets, 3 video datasets, 1 multi-view dataset, and 4 domain generalization benchmarks.

Submission history

From: Jintao Rong [view email]
[v1]
Sun, 4 Jun 2023 03:06:37 UTC (1,175 KB)
[v2]
Tue, 18 Jun 2024 13:16:58 UTC (1,171 KB)
[v3]
Thu, 21 Nov 2024 04:12:53 UTC (725 KB)

Source link
lol

By stp2y