On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding, by Kevin Xu and Issei Sato

View PDF
HTML (experimental)

Abstract:Looped Transformers offer advantages in parameter efficiency and Turing completeness. However, their expressive power for function approximation and approximation rate remains underexplored. In this paper, we establish approximation rates of Looped Transformers by defining the concept of the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts us to incorporate scaling parameters for each loop, conditioned on timestep encoding. Experimental results demonstrate that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding architecture.

Submission history

From: Kevin Xu [view email]
[v1]
Wed, 2 Oct 2024 10:31:17 UTC (1,465 KB)
[v2]
Tue, 8 Oct 2024 16:41:40 UTC (974 KB)
[v3]
Mon, 25 Nov 2024 08:17:14 UTC (1,169 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.