Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation, by Ming Xu and 1 other authors

View PDF

Abstract:Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at this https URL

Submission history

From: Ming Xu [view email]
[v1]
Mon, 18 Mar 2024 07:51:22 UTC (6,073 KB)
[v2]
Thu, 29 Aug 2024 02:13:09 UTC (8,806 KB)
[v3]
Sun, 6 Oct 2024 04:35:30 UTC (9,788 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.