GitHub - ggml-org/llama.cpp: LLM inference in C/C++
6 hours ago
- #LLM
- #Inference
- #C++
- Llama.cpp enables efficient LLM inference in C/C++ with minimal setup and high performance across diverse hardware.
- Key features include support for multiple hardware backends (CUDA, Metal, SYCL), various quantization levels, and CPU+GPU hybrid inference.
- Models need to be in GGUF format, and the ecosystem includes extensive tooling for conversion, quantization, and serving via HTTP servers like llama-server.
- The project supports a wide range of models, offers numerous bindings for different programming languages, and has a growing ecosystem of UIs and tools.
- Infrastructure integrations and community contributions are encouraged, with detailed guides for installation, usage, and development.