Transformer Block Coupling and its Correlation with Generalization in LLMs

stp2yDecember 24, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 10 Jul 2024 (v1), last revised 22 Dec 2024 (this version, v3)]

View a PDF of the paper titled Transformer Block Coupling and its Correlation with Generalization in LLMs, by Murdock Aubry and 3 other authors

View PDF
HTML (experimental)

Abstract:Large Language Models (LLMs) have made significant strides in natural language processing, and a precise understanding of the internal mechanisms driving their success is essential. In this work, we trace the trajectories of individual tokens as they pass through transformer blocks, and linearize the system along these trajectories through their Jacobian matrices. By examining the relationships between these Jacobians, we uncover a $textbf{transformer block coupling}$ phenomenon in a variety of LLMs, characterized by the coupling of their top singular vectors across tokens and depth. Our findings reveal that coupling $textit{positively correlates}$ with model performance, and that this relationship is stronger than with other hyperparameters, namely parameter budget, model depth, and embedding dimension. We further investigate the emergence of these properties through training, noting the development of coupling, as well as an increase in linearity and layer-wise exponential growth in the token trajectories. These collective insights provide a novel perspective on the interactions between token embeddings, and prompt further approaches to study training and generalization in LLMs.

Submission history

From: Haoming Meng [view email]
[v1]
Wed, 10 Jul 2024 16:30:27 UTC (43,199 KB)
[v2]
Mon, 14 Oct 2024 04:29:05 UTC (40,676 KB)
[v3]
Sun, 22 Dec 2024 06:54:57 UTC (43,977 KB)

Source link
lol

By stp2y