Show HN: "Be horse." – a diffusion language model on an M2 Air
10 hours ago
- #Machine Learning
- #PyTorch Training
- #Diffusion Language Models
- Diffusion Language Models (DLMs) are trained by corrupting data with noise and then learning to reverse that corruption, which is currently a hot topic in machine learning.
- Unlike autoregressive models that decode tokens left-to-right, diffusion models decode the entire sequence in parallel, potentially offering higher tokens per second, as seen in models like Mercury2.
- Training involves masking random tokens in a text sequence with a [MASK] token and using cross-entropy loss on only masked tokens, while also passing the masking probability as an input.
- Decoding starts with all tokens set to [MASK] and proceeds through multiple denoising steps, gradually revealing the sequence, with an example of k=20 steps.
- Despite undertraining resulting in nonsensical outputs, the model shows impressive learning by generating real words and sentence-like structures, even on limited hardware like an M2 MacBook Air.
- The project highlights the potential and fascination of diffusion models, with future interest in exploring inner workings and multi-modal applications, while noting gaps in addressing model performance and fixed decoding lengths.