30
May
[Submitted on 8 Apr 2024] View a PDF of the paper titled GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts, by Zolt'an 'A. Milacski and 4 other authors View PDF Abstract:The connection between our 3D surroundings and the descriptive language that characterizes them would be well-suited for localizing and generating human motion in context but for one problem. The complexity introduced by multiple modalities makes capturing this connection challenging with a fixed set of descriptors. Specifically, closed vocabulary scene encoders, which require learning text-scene associations from scratch, have been favored in the literature, often resulting in inaccurate motion…