TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

6 days ago

Copy Link

Diffusion models are the mainstream approach for visual generation but suffer from high training costs and sample inefficiency.
Existing methods for improving training efficiency come with tradeoffs, such as increased computational cost or reduced performance.
TREAD (Token Routing for Efficient Architecture-agnostic Diffusion Training) improves both training efficiency and generative performance simultaneously.
TREAD routes randomly selected tokens from early layers to deeper layers without architectural modifications or additional parameters.
The method is applicable to transformer-based and state-space models.
TREAD achieves a 14x convergence speedup at 400K training iterations compared to DiT and 37x compared to DiT's best benchmark performance at 7M iterations.
It achieves competitive FID scores of 2.09 (guided) and 3.93 (unguided) on the ImageNet-256 benchmark, improving upon DiT.

Hasty Briefsbeta