Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

stp2yDecember 31, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 9 Jul 2024 (v1), last revised 29 Dec 2024 (this version, v2)]

View a PDF of the paper titled Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models, by Yue Zhang and 8 other authors

View PDF
HTML (experimental)

Abstract:Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.

Submission history

From: Jialu Li [view email]
[v1]
Tue, 9 Jul 2024 16:53:36 UTC (2,159 KB)
[v2]
Sun, 29 Dec 2024 23:16:37 UTC (2,175 KB)

Source link
lol

By stp2y