Hasty Briefsbeta

TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

6 days ago
  • #training-efficiency
  • #diffusion-models
  • #computer-vision
  • Diffusion models are the mainstream approach for visual generation but suffer from high training costs and sample inefficiency.
  • Existing methods for improving training efficiency come with tradeoffs, such as increased computational cost or reduced performance.
  • TREAD (Token Routing for Efficient Architecture-agnostic Diffusion Training) improves both training efficiency and generative performance simultaneously.
  • TREAD routes randomly selected tokens from early layers to deeper layers without architectural modifications or additional parameters.
  • The method is applicable to transformer-based and state-space models.
  • TREAD achieves a 14x convergence speedup at 400K training iterations compared to DiT and 37x compared to DiT's best benchmark performance at 7M iterations.
  • It achieves competitive FID scores of 2.09 (guided) and 3.93 (unguided) on the ImageNet-256 benchmark, improving upon DiT.