Mathematical Models of Computation in Superposition

arXiv:2408.05451v1 Announce Type: new
Abstract: Superposition — when a neural network represents more “features” than it has dimensions — seems to pose a serious challenge to mechanistically interpreting current AI systems. Existing theory work studies emph{representational} superposition, where superposition is only used when passing information through bottlenecks. In this work, we present mathematical models of emph{computation} in superposition, where superposition is actively helpful for efficiently accomplishing the task.
We first construct a task of efficiently emulating a circuit that takes the AND of the $binom{m}{2}$ pairs of each of $m$ features. We construct a 1-layer MLP that uses superposition to perform this task up to $varepsilon$-error, where the network only requires $tilde{O}(m^{frac{2}{3}})$ neurons, even when the input features are emph{themselves in superposition}. We generalize this construction to arbitrary sparse boolean circuits of low depth, and then construct “error correction” layers that allow deep fully-connected networks of width $d$ to emulate circuits of width $tilde{O}(d^{1.5})$ and emph{any} polynomial depth. We conclude by providing some potential applications of our work for interpreting neural networks that implement computation in superposition.

Source link
lol

Mathematical Models of Computation in Superposition

By stp2y

Leave a Reply Cancel reply