Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

9 days ago

Copy Link

Local AI models (<=20B parameters) now match frontier models in performance for many tasks.
Local accelerators (e.g., Apple M4 Max) enable interactive latencies for small LMs.
Proposed metric: Intelligence per Watt (IPW) = task accuracy / power unit, to evaluate local AI efficiency.
Study covers 20+ local LMs, 8 accelerators, and 1M real-world queries, measuring accuracy, energy, latency, and power.
Findings: Local LMs accurately answer 88.7% of single-turn chat and reasoning queries.
IPW improved 5.3x from 2023-2025; local query coverage rose from 23.2% to 71.3%.
Local accelerators achieve at least 1.4x lower IPW than cloud accelerators for identical models.
IPW profiling harness released for systematic benchmarking of intelligence efficiency.

Hasty Briefsbeta