Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

a year ago

Modern AI models typically use 16- or 32-bit floating point numbers for storing weights, which require large memory and processing resources.
Microsoft's General Artificial Intelligence group has developed a new ternary neural network model using only -1, 0, or 1 as weight values.
This ternary architecture reduces complexity and improves computational efficiency, allowing it to run effectively on a desktop CPU.
Despite reduced weight precision, the model claims performance comparable to full-precision models of similar size.
Previous quantization techniques have focused on reducing memory usage, with extreme cases like BitNets using single-bit weights.
The new BitNet b1.58b model is a ternary system, referred to as '1.58-bit', and is the first open-source, native 1-bit LLM trained at scale.
Unlike post-training quantization, which can degrade performance, BitNet b1.58b is natively trained, avoiding significant performance loss.
The model is based on a training dataset of 4 trillion tokens and scales to 2 billion tokens, matching larger full-precision models.

Hasty Briefsbeta