David Patterson: Challenges and Research Directions for LLM Inference Hardware
3 months ago
- #AI Inference
- #LLM
- #Hardware Architecture
- Large Language Model (LLM) inference is challenging due to the autoregressive Decode phase of Transformer models.
- Primary challenges in LLM inference are memory and interconnect, not compute.
- Four architecture research opportunities are highlighted: High Bandwidth Flash, Processing-Near-Memory, 3D memory-logic stacking, and low-latency interconnect.
- The focus is on datacenter AI, but applicability for mobile devices is also reviewed.