AID: Attention Interpolation of Text-to-Image Diffusion

stp2yOctober 7, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 26 Mar 2024 (v1), last revised 4 Oct 2024 (this version, v3)]

View a PDF of the paper titled AID: Attention Interpolation of Text-to-Image Diffusion, by Qiyuan He and 3 other authors

View PDF
HTML (experimental)

Abstract:Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or poses is less understood. Simple approaches, such as linear interpolation in the space of conditions, often result in images that lack consistency, smoothness, and fidelity. To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID). Our key contributions include 1) proposing an inner/outer interpolated attention layer; 2) fusing the interpolated attention with self-attention to boost fidelity; and 3) applying beta distribution to selection to increase smoothness. We also present a variant, Prompt-guided Attention Interpolation via Diffusion (PAID), that considers interpolation as a condition-dependent generative process. This method enables the creation of new images with greater consistency, smoothness, and efficiency, and offers control over the exact path of interpolation. Our approach demonstrates effectiveness for conceptual and spatial interpolation. Code and demo are available at this https URL.

Submission history

From: Qiyuan He [view email]
[v1]
Tue, 26 Mar 2024 17:57:05 UTC (46,869 KB)
[v2]
Thu, 18 Apr 2024 05:11:54 UTC (13,422 KB)
[v3]
Fri, 4 Oct 2024 17:09:40 UTC (21,579 KB)

Source link
lol

By stp2y