Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons
a year ago
- #Machine Learning
- #Efficient Architectures
- #Large Language Models
- Introduces a novel non-attention based architecture for large language models (LLMs) capable of handling ultra-long context windows (hundreds of thousands to millions of tokens).
- Avoids quadratic memory and computation overload by eliminating token-to-token attention, unlike traditional Transformer designs.
- Combines State Space blocks (inspired by S4) for near-linear scaling with sequence length, Multi-Resolution Convolution layers for local context capture, a lightweight Recurrent Supervisor for global hidden state maintenance, and Retrieval-Augmented External Memory for efficient high-level chunk embedding storage and retrieval.