RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, by Aleksandar Botev and 61 other authors

View PDF
HTML (experimental)

Abstract:We introduce RecurrentGemma, a family of open language models which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

Submission history

From: Soham De [view email]
[v1]
Thu, 11 Apr 2024 15:27:22 UTC (572 KB)
[v2]
Wed, 28 Aug 2024 15:05:42 UTC (591 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.