View a PDF of the paper titled Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction, by Menglin Xia and 5 other authors
Abstract:Large language models (LLMs) enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM’s capabilities and cloud-based data. Meanwhile, via a novel asynchronous memory update mechanism, the client model can deliver real-time completions to user inputs without the need to wait for responses from the cloud. Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.
Submission history
From: Menglin Xia [view email]
[v1]
Tue, 8 Aug 2023 12:27:20 UTC (1,480 KB)
[v2]
Mon, 5 Feb 2024 14:55:19 UTC (1,511 KB)
[v3]
Sat, 12 Oct 2024 12:50:33 UTC (1,656 KB)
Source link
lol