Hasty Briefsbeta

Bilingual

GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

2 months ago
  • #inference optimization
  • #BitNet
  • #1-bit LLMs
  • bitnet.cpp is an optimized inference framework for 1-bit LLMs like BitNet b1.58, supporting fast and lossless inference on CPU and GPU.
  • Achieves speedups of 1.37x to 6.17x on ARM and x86 CPUs, with energy reductions up to 82.2%.
  • Supports running a 100B BitNet b1.58 model on a single CPU at human-readable speeds (5-7 tokens per second).
  • Latest optimizations include parallel kernel implementations and embedding quantization, offering additional speedups of 1.15x to 2.1x.
  • Demo available for BitNet b1.58 3B model on Apple M2.
  • Supports various models including BitNet-b1.58-2B-4T, bitnet_b1_58-large, and Falcon3 Family.
  • Requires Python>=3.9, cmake>=3.22, and clang>=18 for setup.
  • Detailed installation and usage instructions provided for Windows and Debian/Ubuntu users.
  • Includes scripts for running inference benchmarks and generating dummy models for unsupported layouts.
  • Common issues and fixes documented, such as clang installation verification and Visual Studio tool initialization.