Running Local LLMs Offline on a Ten-Hour Flight

a month ago

Author tested local LLMs on a MacBook Pro M5 Max with 128GB memory using Gemma 4 31B and Qwen 4.6 36B models via LM Studio during a 10-hour flight without Wi-Fi.
Built a billing analytics tool for cloud spend using DuckDB and processed 4M tokens for smaller tasks, with models performing comparably to frontier models for tight-scope work.
Encountered limitations: high power consumption (~1% battery per minute), overheating, degraded performance past 100k tokens, and occasional infinite loops in prompts.
Created instrumentation tools: powermonitor for power telemetry and lmstats for LLM performance metrics to monitor system behavior before acting.
Community discussions highlighted benefits of local inference for cost intuition and Apple Silicon's efficiency, while cable choice significantly impacted power delivery (60W vs. 94W).
Key takeaways: Local inference is viable for specific engineering tasks, promotes discipline in prompt and context management, and cloud remains better for high-value or large-context work.
Future plans include testing with the correct cable for improved power and exploring Neural Engine-powered small LLMs for efficiency.

Hasty Briefsbeta