GitHub - ggml-org/llama.cpp: LLM inference in C/C++

6 hours ago

Llama.cpp enables efficient LLM inference in C/C++ with minimal setup and high performance across diverse hardware.
Key features include support for multiple hardware backends (CUDA, Metal, SYCL), various quantization levels, and CPU+GPU hybrid inference.
Models need to be in GGUF format, and the ecosystem includes extensive tooling for conversion, quantization, and serving via HTTP servers like llama-server.
The project supports a wide range of models, offers numerous bindings for different programming languages, and has a growing ecosystem of UIs and tools.
Infrastructure integrations and community contributions are encouraged, with detailed guides for installation, usage, and development.

Hasty Briefsbeta