MLX-Serve a Native LLM Runtime for Apple Silicon
2 days ago
- #macOS App
- #AI Inference
- #OpenAI-Compatible
- An inference server and macOS menu bar app built in Zig and Swift, offering a drop-in replacement for OpenAI-compatible APIs with features like chat completions, streaming, tool calling, embeddings, and logprobs.
- Utilizes direct MLX-C bindings without Python for fast performance, includes KV cache reuse across requests, and supports quantized MLX-format models from HuggingFace with 7 built-in tools and extendable prompt-based skills via markdown files.
- Allows real-time SSE streaming with automatic tool call detection for multi-turn reasoning, includes a native macOS app for managing models and chats, and supports various models from Google, Alibaba, Meta, and Mistral AI in different parameter sizes.