GitHub - Blaizzy/mlx-vlm: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
5 hours ago
- #multimodal AI
- #model optimization
- #MLX framework
- MLX-VLM supports inference and fine-tuning of Vision Language Models (VLMs) on macOS using MLX library.
- Installation is possible via pip, and models can be run via CLI commands for text, image, audio, and multimodal generation.
- Features include Activation Quantization for CUDA, multi-image chat, video analysis, and TurboQuant for KV cache compression, reducing memory usage significantly.
- Python scripting allows loading models with APIs like `load` and `generate`, and applying chat templates for formatted prompts.
- A Gradio-based chat interface and a web server are available, offering endpoints for OpenAI-compatible chat completions and responses with dynamic model loading.
- TurboQuant compresses KV cache using quantization schemes, achieving over 75% memory reduction while maintaining performance at long contexts.
- Fine-tuning is supported via LoRA and QLoRA techniques, which are detailed in separate documentation.