David Patterson: Challenges and Research Directions for LLM Inference Hardware

3 months ago

Large Language Model (LLM) inference is challenging due to the autoregressive Decode phase of Transformer models.
Primary challenges in LLM inference are memory and interconnect, not compute.
Four architecture research opportunities are highlighted: High Bandwidth Flash, Processing-Near-Memory, 3D memory-logic stacking, and low-latency interconnect.
The focus is on datacenter AI, but applicability for mobile devices is also reviewed.

Hasty Briefsbeta