Analog Foundation Models

a year ago

Analog in-memory computing (AIMC) improves speed and power efficiency for neural network inference.
AIMC introduces challenges like noisy computations and strict quantization constraints.
Existing LLMs struggle to achieve 4-bit-level performance on AIMC hardware.
A new method adapts LLMs for noisy, low-precision analog hardware effectively.
State-of-the-art models like Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct retain performance comparable to 4-bit weight, 8-bit activation baselines.
The method also enables quantization for low-precision digital hardware.
Models benefit from test-time compute scaling, showing better behavior than static quantization models.
The work bridges the gap between high-capacity LLMs and efficient analog hardware.

Hasty Briefsbeta