LoQT: Low-Rank Adapters for Quantized Pre-Training

stp2ySeptember 10, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 26 May 2024 (v1), last revised 9 Sep 2024 (this version, v3)]

View a PDF of the paper titled LoQT: Low-Rank Adapters for Quantized Pre-Training, by Sebastian Loeschcke and 3 other authors

View PDF
HTML (experimental)

Abstract:Training of large neural networks requires significant computational resources. Despite advances using low-rank adapters and quantization, pretraining of models such as LLMs on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose LoQT, a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning of models, which we demonstrate experimentally for language modeling and downstream task adaptation. We find that LoQT enables efficient training of models up to 7B parameters on a consumer-grade 24GB GPU. We also demonstrate the feasibility of training a 13B parameter model using per-layer gradient updates on the same hardware.

Submission history

From: Sebastian Loeschcke [view email]
[v1]
Sun, 26 May 2024 11:29:57 UTC (275 KB)
[v2]
Mon, 26 Aug 2024 14:59:53 UTC (271 KB)
[v3]
Mon, 9 Sep 2024 14:31:26 UTC (271 KB)

Source link
lol

By stp2y