Show HN: "Be horse." – a diffusion language model on an M2 Air

10 hours ago

Diffusion Language Models (DLMs) are trained by corrupting data with noise and then learning to reverse that corruption, which is currently a hot topic in machine learning.
Unlike autoregressive models that decode tokens left-to-right, diffusion models decode the entire sequence in parallel, potentially offering higher tokens per second, as seen in models like Mercury2.
Training involves masking random tokens in a text sequence with a [MASK] token and using cross-entropy loss on only masked tokens, while also passing the masking probability as an input.
Decoding starts with all tokens set to [MASK] and proceeds through multiple denoising steps, gradually revealing the sequence, with an example of k=20 steps.
Despite undertraining resulting in nonsensical outputs, the model shows impressive learning by generating real words and sentence-like structures, even on limited hardware like an M2 MacBook Air.
The project highlights the potential and fascination of diffusion models, with future interest in exploring inner workings and multi-modal applications, while noting gaps in addressing model performance and fixed decoding lengths.

Hasty Briefsbeta