Superposition in Transformers: A Novel Way of Building Mixture of Experts

stp2yJanuary 8, 20250 Comments

Every’s Master Plan

[Submitted on 31 Dec 2024 (v1), last revised 6 Jan 2025 (this version, v2)]

View a PDF of the paper titled Superposition in Transformers: A Novel Way of Building Mixture of Experts, by Ayoub Ben Chaliah and 1 other authors

View PDF
HTML (experimental)

Abstract:Catastrophic forgetting remains a major challenge when adapting large language models (LLMs) to new tasks or domains. Conventional fine-tuning often overwrites existing knowledge, causing performance degradation on original tasks. We introduce Superposition in Transformers, a novel architecture that leverages autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. By using B-spline-based blending coefficients and autoencoders that adaptively reconstruct hidden states based on the input data distribution, our method effectively mitigates catastrophic forgetting and enables a new paradigm of “in-model” superposition. This approach preserves original model capabilities while allowing compact domain-specific expertise to be added, and it supports dynamic switching between model states during inference.

Submission history

From: Hela Dellagi [view email]
[v1]
Tue, 31 Dec 2024 16:28:23 UTC (1,605 KB)
[v2]
Mon, 6 Jan 2025 23:02:42 UTC (1,771 KB)

Source link
lol

By stp2y