Hasty Briefsbeta

Bilingual

Show HN: "Be horse." – a diffusion language model on an M2 Air

10 hours ago
  • #Machine Learning
  • #PyTorch Training
  • #Diffusion Language Models
  • Diffusion Language Models (DLMs) are trained by corrupting data with noise and then learning to reverse that corruption, which is currently a hot topic in machine learning.
  • Unlike autoregressive models that decode tokens left-to-right, diffusion models decode the entire sequence in parallel, potentially offering higher tokens per second, as seen in models like Mercury2.
  • Training involves masking random tokens in a text sequence with a [MASK] token and using cross-entropy loss on only masked tokens, while also passing the masking probability as an input.
  • Decoding starts with all tokens set to [MASK] and proceeds through multiple denoising steps, gradually revealing the sequence, with an example of k=20 steps.
  • Despite undertraining resulting in nonsensical outputs, the model shows impressive learning by generating real words and sentence-like structures, even on limited hardware like an M2 MacBook Air.
  • The project highlights the potential and fascination of diffusion models, with future interest in exploring inner workings and multi-modal applications, while noting gaps in addressing model performance and fixed decoding lengths.