1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs, by Jinheng Wang and 7 other authors

View PDF
HTML (experimental)

Abstract:Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment across a broad range of devices. In this work, we introduce this http URL, a tailored software stack designed to unlock the full potential of 1-bit LLMs. Specifically, we develop a set of kernels to support fast and lossless inference of ternary BitNet b1.58 LLMs on CPUs. Extensive experiments demonstrate that this http URL achieves significant speedups, ranging from 2.37x to 6.17x on x86 CPUs and from 1.37x to 5.07x on ARM CPUs, across various model sizes. The code is available at this https URL.

Submission history

From: Shaoguang Mao [view email]
[v1]
Mon, 21 Oct 2024 16:14:57 UTC (399 KB)
[v2]
Wed, 23 Oct 2024 11:17:42 UTC (400 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.