Diffusion Models Explained Simply
a year ago
- #diffusion-models
- #machine-learning
- #ai
- Diffusion models are trained to identify and remove noise from images based on captions.
- Unlike transformers, diffusion models operate on entire images or tensors, not sequences of tokens.
- Training involves adding noise to images and having the model predict the noise added.
- Inference starts with pure noise and iteratively removes layers to generate an image.
- Variational auto-encoders (VAEs) compress images into smaller, random-looking tensors for efficiency.
- Classifier-free guidance ensures the model generates images relevant to the caption.
- Diffusion models can be stopped early for faster but noisier results, unlike transformers.
- Video diffusion models treat entire video clips as single tensors, learning frame relationships.
- Text diffusion models add noise to text embeddings, but converting back to text is challenging.
- Diffusion models are powerful for images, videos, and audio, but text generation is less straightforward.