Hasty Briefsbeta

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

9 days ago
  • #Local Inference
  • #Energy Efficiency
  • #Artificial Intelligence
  • Local AI models (<=20B parameters) now match frontier models in performance for many tasks.
  • Local accelerators (e.g., Apple M4 Max) enable interactive latencies for small LMs.
  • Proposed metric: Intelligence per Watt (IPW) = task accuracy / power unit, to evaluate local AI efficiency.
  • Study covers 20+ local LMs, 8 accelerators, and 1M real-world queries, measuring accuracy, energy, latency, and power.
  • Findings: Local LMs accurately answer 88.7% of single-turn chat and reasoning queries.
  • IPW improved 5.3x from 2023-2025; local query coverage rose from 23.2% to 71.3%.
  • Local accelerators achieve at least 1.4x lower IPW than cloud accelerators for identical models.
  • IPW profiling harness released for systematic benchmarking of intelligence efficiency.