Running Local LLMs Offline on a Ten-Hour Flight
6 hours ago
- #AI Engineering
- #Local LLMs
- #Hardware Performance
- Author tested local LLMs on a MacBook Pro M5 Max with 128GB memory using Gemma 4 31B and Qwen 4.6 36B models via LM Studio during a 10-hour flight without Wi-Fi.
- Built a billing analytics tool for cloud spend using DuckDB and processed 4M tokens for smaller tasks, with models performing comparably to frontier models for tight-scope work.
- Encountered limitations: high power consumption (~1% battery per minute), overheating, degraded performance past 100k tokens, and occasional infinite loops in prompts.
- Created instrumentation tools: powermonitor for power telemetry and lmstats for LLM performance metrics to monitor system behavior before acting.
- Community discussions highlighted benefits of local inference for cost intuition and Apple Silicon's efficiency, while cable choice significantly impacted power delivery (60W vs. 94W).
- Key takeaways: Local inference is viable for specific engineering tasks, promotes discipline in prompt and context management, and cloud remains better for high-value or large-context work.
- Future plans include testing with the correct cable for improved power and exploring Neural Engine-powered small LLMs for efficiency.