Virtual Width Networks (VWN)
8 days ago
- #Model Efficiency
- #Machine Learning
- #Neural Networks
- Introduces Virtual Width Networks (VWN), a framework that expands embedding space without increasing backbone compute.
- VWN decouples representational width from backbone width, maintaining efficiency while enhancing performance.
- Large-scale experiments show an 8-times expansion accelerates optimization by over 2 times for next-token prediction and 3 times for next-2-token prediction.
- Identifies a log-linear scaling relation between virtual width and loss reduction, suggesting a new dimension for large-model efficiency.