Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
9 days ago
- #Local Inference
- #Energy Efficiency
- #Artificial Intelligence
- Local AI models (<=20B parameters) now match frontier models in performance for many tasks.
- Local accelerators (e.g., Apple M4 Max) enable interactive latencies for small LMs.
- Proposed metric: Intelligence per Watt (IPW) = task accuracy / power unit, to evaluate local AI efficiency.
- Study covers 20+ local LMs, 8 accelerators, and 1M real-world queries, measuring accuracy, energy, latency, and power.
- Findings: Local LMs accurately answer 88.7% of single-turn chat and reasoning queries.
- IPW improved 5.3x from 2023-2025; local query coverage rose from 23.2% to 71.3%.
- Local accelerators achieve at least 1.4x lower IPW than cloud accelerators for identical models.
- IPW profiling harness released for systematic benchmarking of intelligence efficiency.