What's in a GGUF, besides the weights – and what's still missing?

4 hours ago

GGUF is a single file format used by llama.cpp for language models, making it more ergonomic by consolidating metadata and configurations.
Chat templates in GGUF define conversation formats using Jinja2 templating, with the default stored in tokenizer.chat_template, though models may have multiple variants.
Special tokens, such as end-of-sequence (eos) or tool call markers, manage model behavior like stopping generation and formatting structured outputs.
Sampler configurations can be embedded in GGUF files, including the order of sampling steps via the general.sampling.sequence field to optimize token selection.
Missing elements in GGUF include standardized tool calling formats, think token fields for separating reasoning outputs, projection models for multimodal support, and feature flags to indicate model capabilities.
Improvements proposed include adding grammars for tool call parsing, bundling projection models within GGUF, and enhancing metadata to support feature detection without model-specific code.

Hasty Briefsbeta