Mistral Integration Improved in Llama.cpp

13 days ago

Copy Link

Notifications require signing in to change settings.
Improvement of Mistral models integration with llama.cpp discussed in issue #14737.
Mistral's architecture uses sliding window attention (SWA) with a default window size of 4096 tokens.
Support for passing jinja templates in llama.cpp for model serving.
Discussion on updating Pydantic requirements and handling formatting/style changes in PRs.
Plans to add support for Voxtral model in llama.cpp after current PR merges.
Refactoring and merging of code for better integration and maintenance.
Release of Magistral GGUF model and its smooth operation with llama.cpp.
Final review and readiness for merging the PR with community feedback incorporated.

Hasty Briefsbeta