From Noise to Image – interactive guide to diffusion

2 days ago

The number of possible images is astronomically large, around 10^400,000, most of which are random noise.
Diffusion models start with random noise and gradually remove it to form coherent images, unlike humans who start with a blank canvas.
Models operate in a compressed 'latent space' with fewer dimensions than the full image space, making the process more manageable.
Text prompts are mapped to a high-dimensional 'embedding space' which acts as a compass for the diffusion process.
The random seed determines the starting point in the image space, leading to slightly different results for the same prompt.
The number of inference steps affects the quality of the image; too few steps can lead to off-track results, while too many may not improve quality significantly.
Detailed prompts constrain the direction more tightly, leading to better results compared to vague prompts.
The 'guidance scale' determines how strongly the model follows the prompt, with higher values leading to more constrained but potentially unnatural images.
The diffusion model's journey from noise to image involves navigating through a vast space guided by the prompt, random seed, step count, and guidance scale.

Hasty Briefsbeta