GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs
2 months ago
- #inference optimization
- #BitNet
- #1-bit LLMs
- bitnet.cpp is an optimized inference framework for 1-bit LLMs like BitNet b1.58, supporting fast and lossless inference on CPU and GPU.
- Achieves speedups of 1.37x to 6.17x on ARM and x86 CPUs, with energy reductions up to 82.2%.
- Supports running a 100B BitNet b1.58 model on a single CPU at human-readable speeds (5-7 tokens per second).
- Latest optimizations include parallel kernel implementations and embedding quantization, offering additional speedups of 1.15x to 2.1x.
- Demo available for BitNet b1.58 3B model on Apple M2.
- Supports various models including BitNet-b1.58-2B-4T, bitnet_b1_58-large, and Falcon3 Family.
- Requires Python>=3.9, cmake>=3.22, and clang>=18 for setup.
- Detailed installation and usage instructions provided for Windows and Debian/Ubuntu users.
- Includes scripts for running inference benchmarks and generating dummy models for unsupported layouts.
- Common issues and fixes documented, such as clang installation verification and Visual Studio tool initialization.