Hasty Briefsbeta

Bilingual

Diffusion Beats Autoregressive in Data-Constrained Settings

9 months ago
  • #diffusion-models
  • #machine-learning
  • #autoregressive-models
  • Autoregressive (AR) models have traditionally dominated large language models.
  • Diffusion-based language models are emerging as a promising alternative to AR models.
  • Diffusion models outperform AR models in data-constrained settings where compute is abundant but data is scarce.
  • Masked diffusion models achieve lower validation loss and better downstream performance by leveraging repeated data more effectively.
  • Diffusion models benefit from implicit data augmentation due to diverse token orderings and prediction tasks.
  • New scaling laws for diffusion models are identified, with a critical compute threshold derived for when diffusion outperforms AR.
  • When data is the bottleneck, diffusion models present a compelling alternative to AR models.