[Submitted on 11 Apr 2024 (
v1), last revised 28 Aug 2024 (this version, v2)]
Authors:Aleksandar Botev,
Soham De,
Samuel L Smith,
Anushan Fernando,
George-Cristian Muraru,
Ruba Haroun,
Leonard Berrada,
Razvan Pascanu,
Pier Giuseppe Sessa,
Robert Dadashi,
Léonard Hussenot,
Johan Ferret,
Sertan Girgin,
Olivier Bachem,
Alek Andreev,
Kathleen Kenealy,
Thomas Mesnard,
Cassidy Hardin,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti,
Armand Joulin,
Noah Fiedel,
Evan Senter,
Yutian Chen,
Srivatsan Srinivasan,
Guillaume Desjardins,
David Budden,
Arnaud Doucet,
Sharad Vikram,
Adam Paszke,
Trevor Gale,
Sebastian Borgeaud,
Charlie Chen,
Andy Brock,
Antonia Paterson,
Jenny Brennan,
Meg Risdal,
Raj Gundluru,
Nesh Devanathan,
Paul Mooney,
Nilay Chauhan,
Phil Culliton,
Luiz Gustavo Martins,
Elisa Bandy,
David Huntsperger,
Glenn Cameron,
Arthur Zucker,
Tris Warkentin,
Ludovic Peran,
Minh Giang,
Zoubin Ghahramani,
Clément Farabet,
Koray Kavukcuoglu,
Demis Hassabis,
Raia Hadsell,
Yee Whye Teh,
Nando de Frietas
View a PDF of the paper titled RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, by Aleksandar Botev and 61 other authors
View PDF
HTML (experimental)
Abstract:We introduce RecurrentGemma, a family of open language models which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Submission history
From: Soham De [view email]
[v1]
Thu, 11 Apr 2024 15:27:22 UTC (572 KB)
[v2]
Wed, 28 Aug 2024 15:05:42 UTC (591 KB)
Source link
lol