Hasty Briefsbeta

Bilingual

Running Google Gemma 4 Locally with LM Studio's New Headless CLI and Claude Code

18 hours ago
  • #gemma4
  • #llm
  • #local-ai
  • LM Studio 0.4.0 introduced llmster and a CLI for headless local model inference.
  • Google Gemma 4 26B-A4B is a mixture-of-experts model offering high performance with lower resource use.
  • On a MacBook Pro M4 Pro (48 GB), Gemma 4 runs at 51 tokens/second with 256K context support.
  • The article details installing LM Studio CLI, downloading models, and configuring settings.
  • It explains how to estimate memory usage, manage context length, and use parallel requests.
  • Gemma 4 can be integrated with Claude Code via an Anthropic-compatible endpoint for local coding assistance.
  • Key benefits include privacy, cost savings, and offline use, but slower speed and memory constraints are noted.
  • The setup involves environment variables to route Claude Code to the local LM Studio server.
  • Future plans include testing other local models like Qwen 3.5 and GLM 4.7 Flash.