TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation, by Alfredo Garrach’on Ruiz and Tom’as de la Rosa and Daniel Borrajo

View PDF
HTML (experimental)

Abstract:The inference cost of Large Language Models (LLMs) is a significant challenge due to their computational demands, specially on tasks requiring long outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed that LLMs can generate distilled language-concise outputs that retain essential meaning, when prompted appropriately. We propose TRIM, a pipeline for saving computational cost in which a shorter distilled output from the LLM is reconstructed into a full narrative by a smaller model with lower inference costs. Our experiments show promising results, particularly in general knowledge domains with 20.58% saved tokens on average with tiny decrease in evaluation metrics, hinting that this approach can effectively balance efficiency and accuracy in language processing tasks.

Submission history

From: Alfredo Garrachon [view email]
[v1]
Tue, 10 Dec 2024 17:13:35 UTC (294 KB)
[v2]
Mon, 16 Dec 2024 12:06:25 UTC (262 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.