Hasty Briefsbeta

Bilingual

Running Local LLMs Offline on a Ten-Hour Flight

8 hours ago
  • #AI Engineering
  • #Local LLMs
  • #Hardware Performance
  • Author tested local LLMs on a MacBook Pro M5 Max with 128GB memory using Gemma 4 31B and Qwen 4.6 36B models via LM Studio during a 10-hour flight without Wi-Fi.
  • Built a billing analytics tool for cloud spend using DuckDB and processed 4M tokens for smaller tasks, with models performing comparably to frontier models for tight-scope work.
  • Encountered limitations: high power consumption (~1% battery per minute), overheating, degraded performance past 100k tokens, and occasional infinite loops in prompts.
  • Created instrumentation tools: powermonitor for power telemetry and lmstats for LLM performance metrics to monitor system behavior before acting.
  • Community discussions highlighted benefits of local inference for cost intuition and Apple Silicon's efficiency, while cable choice significantly impacted power delivery (60W vs. 94W).
  • Key takeaways: Local inference is viable for specific engineering tasks, promotes discipline in prompt and context management, and cloud remains better for high-value or large-context work.
  • Future plans include testing with the correct cable for improved power and exploring Neural Engine-powered small LLMs for efficiency.