Running Google Gemma 4 Locally with LM Studio's New Headless CLI and Claude Code
20 hours ago
- #gemma4
- #llm
- #local-ai
- LM Studio 0.4.0 introduced llmster and a CLI for headless local model inference.
- Google Gemma 4 26B-A4B is a mixture-of-experts model offering high performance with lower resource use.
- On a MacBook Pro M4 Pro (48 GB), Gemma 4 runs at 51 tokens/second with 256K context support.
- The article details installing LM Studio CLI, downloading models, and configuring settings.
- It explains how to estimate memory usage, manage context length, and use parallel requests.
- Gemma 4 can be integrated with Claude Code via an Anthropic-compatible endpoint for local coding assistance.
- Key benefits include privacy, cost savings, and offline use, but slower speed and memory constraints are noted.
- The setup involves environment variables to route Claude Code to the local LM Studio server.
- Future plans include testing other local models like Qwen 3.5 and GLM 4.7 Flash.