LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

arXiv:2411.13009v1 Announce Type: cross
Abstract: As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods.

Source link
lol

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

By stp2y

Leave a Reply Cancel reply