Hierarchical Autoregressive Modeling for Memory-Efficient Language Generation
4 months ago
- #Language Generation
- #Machine Learning
- #Efficiency
- PHOTON introduces a hierarchical autoregressive model for efficient language generation.
- It replaces flat token scanning with vertical, multi-resolution context access.
- PHOTON maintains a hierarchy of latent streams for better performance.
- Experimental results show PHOTON outperforms Transformer-based models in throughput-quality trade-off.
- PHOTON reduces KV-cache traffic, offering up to 1000x higher throughput per unit memory.