View a PDF of the paper titled Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation, by Quanting Xie and 9 other authors
Abstract:There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhorse of large-scale non-parametric knowledge; however, existing techniques do not directly transfer to the embodied domain, which is multimodal, where data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG’s memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries across kilometer-level environments, highlighting its promise as a general-purpose non-parametric system for embodied agents.
Submission history
From: Quanting Xie [view email]
[v1]
Thu, 26 Sep 2024 21:44:11 UTC (17,109 KB)
[v2]
Tue, 1 Oct 2024 20:32:17 UTC (17,112 KB)
[v3]
Thu, 3 Oct 2024 15:17:22 UTC (15,725 KB)
[v4]
Tue, 8 Oct 2024 15:07:16 UTC (15,725 KB)
[v5]
Tue, 21 Jan 2025 02:38:32 UTC (15,761 KB)
Source link
lol