From Noise to Image – interactive guide to diffusion
2 days ago
- #AI
- #Text-to-Image
- #Diffusion Models
- The number of possible images is astronomically large, around 10^400,000, most of which are random noise.
- Diffusion models start with random noise and gradually remove it to form coherent images, unlike humans who start with a blank canvas.
- Models operate in a compressed 'latent space' with fewer dimensions than the full image space, making the process more manageable.
- Text prompts are mapped to a high-dimensional 'embedding space' which acts as a compass for the diffusion process.
- The random seed determines the starting point in the image space, leading to slightly different results for the same prompt.
- The number of inference steps affects the quality of the image; too few steps can lead to off-track results, while too many may not improve quality significantly.
- Detailed prompts constrain the direction more tightly, leading to better results compared to vague prompts.
- The 'guidance scale' determines how strongly the model follows the prompt, with higher values leading to more constrained but potentially unnatural images.
- The diffusion model's journey from noise to image involves navigating through a vast space guided by the prompt, random seed, step count, and guidance scale.