Ollama is now powered by MLX on Apple Silicon in preview

5 hours ago

Ollama previews faster performance on Apple Silicon using Apple's MLX machine learning framework, improving speed for demanding tasks on macOS.
MLX leverages unified memory architecture and GPU Neural Accelerators on M5 chips to boost time to first token and tokens per second.
NVFP4 format from NVIDIA is integrated to maintain model accuracy while reducing memory bandwidth and storage needs.
Enhanced caching system reuses cache across conversations, lowers memory use, and features intelligent checkpoints and smarter eviction.
Preview release focuses on Qwen3.5-35B-A3B model for coding tasks, requiring Macs with over 32GB unified memory, with plans to support more models.

Hasty Briefsbeta