Ollama is now powered by MLX on Apple Silicon in preview
5 hours ago
- #AI Acceleration
- #Apple Silicon
- #Machine Learning Framework
- Ollama previews faster performance on Apple Silicon using Apple's MLX machine learning framework, improving speed for demanding tasks on macOS.
- MLX leverages unified memory architecture and GPU Neural Accelerators on M5 chips to boost time to first token and tokens per second.
- NVFP4 format from NVIDIA is integrated to maintain model accuracy while reducing memory bandwidth and storage needs.
- Enhanced caching system reuses cache across conversations, lowers memory use, and features intelligent checkpoints and smarter eviction.
- Preview release focuses on Qwen3.5-35B-A3B model for coding tasks, requiring Macs with over 32GB unified memory, with plans to support more models.