View a PDF of the paper titled MLPs Learn In-Context on Regression and Classification Tasks, by William L. Tong and Cengiz Pehlevan
Abstract:In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget in this setting. We further show that MLPs outperform Transformers on a series of classical tasks from psychology designed to test relational reasoning, which are closely related to in-context classification. These results underscore a need for studying in-context learning beyond attention-based architectures, while also challenging strong prior arguments about MLPs’ limited ability to solve relational tasks. Altogether, our results highlight the unexpected competence of MLPs, and support the growing interest in all-MLP alternatives to task-specific architectures.
Submission history
From: William Tong [view email]
[v1]
Fri, 24 May 2024 15:04:36 UTC (5,540 KB)
[v2]
Thu, 26 Sep 2024 16:05:30 UTC (5,725 KB)
Source link
lol