FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation

stp2yDecember 6, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 3 Dec 2024 (v1), last revised 4 Dec 2024 (this version, v2)]

View a PDF of the paper titled FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation, by Kefan Chen and 5 other authors

View PDF
HTML (experimental)

Abstract:Despite remarkable progress in image generation models, generating realistic hands remains a persistent challenge due to their complex articulation, varying viewpoints, and frequent occlusions. We present FoundHand, a large-scale domain-specific diffusion model for synthesizing single and dual hand images. To train our model, we introduce FoundHand-10M, a large-scale hand dataset with 2D keypoints and segmentation mask annotations. Our insight is to use 2D hand keypoints as a universal representation that encodes both hand articulation and camera viewpoint. FoundHand learns from image pairs to capture physically plausible hand articulations, natively enables precise control through 2D keypoints, and supports appearance control. Our model exhibits core capabilities that include the ability to repose hands, transfer hand appearance, and even synthesize novel views. This leads to zero-shot capabilities for fixing malformed hands in previously generated images, or synthesizing hand video sequences. We present extensive experiments and evaluations that demonstrate state-of-the-art performance of our method.

Submission history

From: Kefan Chen [view email]
[v1]
Tue, 3 Dec 2024 18:58:19 UTC (41,386 KB)
[v2]
Wed, 4 Dec 2024 20:51:17 UTC (41,386 KB)

Source link
lol

By stp2y