Need a Small Specialized Language Model? Plan Early!

Pile-T5


View a PDF of the paper titled Need a Small Specialized Language Model? Plan Early!, by David Grangier and 3 other authors

View PDF
HTML (experimental)

Abstract:Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference, but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get good specialized small language models using a large, generic, pretraining set and a limited amount of specialized data. We consider two scenarios, depending on whether (i) one can afford pretraining a model for each specialization task, or (ii) one wants to cheaply adapt a single pretrained model for each task. In the first scenario, we propose an effective solution based on importance sampling: we resample the pretraining set to imitate the specialization data and train a small model on it. In the second scenario, we propose a novel architecture, projected networks (PN). PN is a large network whose parameters can be linearly projected into a small network for specialization. For both scenarios, we demonstrate the empirical effectiveness of our solutions across various domains, training set sizes, and training budgets.

Submission history

From: Pierre Ablin [view email]
[v1]
Fri, 2 Feb 2024 01:45:18 UTC (137 KB)
[v2]
Thu, 31 Oct 2024 15:56:08 UTC (324 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.