31
Jul
Technologies Published 31 July 2024 Authors Language Model Interpretability team Announcing a comprehensive, open suite of sparse autoencoders for language model interpretability.To create an artificial intelligence (AI) language model, researchers build a system that learns from vast amounts of data without human guidance. As a result, the inner workings of language models are often a mystery, even to the researchers who train them. Mechanistic interpretability is a research field focused on deciphering these inner workings. Researchers in this field use sparse autoencoders as a kind of ‘microscope’ that lets them see inside a language model, and get a better sense…