Hasty Briefsbeta

Bilingual

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

a year ago
  • #AI
  • #Quantization
  • #Neural Networks
  • Modern AI models typically use 16- or 32-bit floating point numbers for storing weights, which require large memory and processing resources.
  • Microsoft's General Artificial Intelligence group has developed a new ternary neural network model using only -1, 0, or 1 as weight values.
  • This ternary architecture reduces complexity and improves computational efficiency, allowing it to run effectively on a desktop CPU.
  • Despite reduced weight precision, the model claims performance comparable to full-precision models of similar size.
  • Previous quantization techniques have focused on reducing memory usage, with extreme cases like BitNets using single-bit weights.
  • The new BitNet b1.58b model is a ternary system, referred to as '1.58-bit', and is the first open-source, native 1-bit LLM trained at scale.
  • Unlike post-training quantization, which can degrade performance, BitNet b1.58b is natively trained, avoiding significant performance loss.
  • The model is based on a training dataset of 4 trillion tokens and scales to 2 billion tokens, matching larger full-precision models.