GitHub - Blaizzy/mlx-vlm: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

5 hours ago

MLX-VLM supports inference and fine-tuning of Vision Language Models (VLMs) on macOS using MLX library.
Installation is possible via pip, and models can be run via CLI commands for text, image, audio, and multimodal generation.
Features include Activation Quantization for CUDA, multi-image chat, video analysis, and TurboQuant for KV cache compression, reducing memory usage significantly.
Python scripting allows loading models with APIs like `load` and `generate`, and applying chat templates for formatted prompts.
A Gradio-based chat interface and a web server are available, offering endpoints for OpenAI-compatible chat completions and responses with dynamic model loading.
TurboQuant compresses KV cache using quantization schemes, achieving over 75% memory reduction while maintaining performance at long contexts.
Fine-tuning is supported via LoRA and QLoRA techniques, which are detailed in separate documentation.

Hasty Briefsbeta