Hasty Briefsbeta

Bilingual

Virtual Width Networks (VWN)

8 days ago
  • #Model Efficiency
  • #Machine Learning
  • #Neural Networks
  • Introduces Virtual Width Networks (VWN), a framework that expands embedding space without increasing backbone compute.
  • VWN decouples representational width from backbone width, maintaining efficiency while enhancing performance.
  • Large-scale experiments show an 8-times expansion accelerates optimization by over 2 times for next-token prediction and 3 times for next-2-token prediction.
  • Identifies a log-linear scaling relation between virtual width and loss reduction, suggesting a new dimension for large-model efficiency.