View a PDF of the paper titled Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation, by Ming Xu and 1 other authors
Abstract:Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at this https URL
Submission history
From: Ming Xu [view email]
[v1]
Mon, 18 Mar 2024 07:51:22 UTC (6,073 KB)
[v2]
Thu, 29 Aug 2024 02:13:09 UTC (8,806 KB)
[v3]
Sun, 6 Oct 2024 04:35:30 UTC (9,788 KB)
Source link
lol