Generating Cats with learned lookup tables
2 days ago
- #image generation
- #neural networks
- #machine learning
- Follow-up post on generating cats using lookup tables (LUT) with a dictionary of 512/64 8x8 patterns.
- Surprisingly effective results despite initial doubts about the model's expressivity limitations.
- Model uses a patch transformer with 16 self-attention blocks on 64 tokens per image (8x8 RGB patches).
- Each 8x8 patch is a softmax sum over 512 learned patterns, allowing interpolation.
- Training involves lerping to noise and predicting the original image; inference starts from Gaussian noise.
- LUT entries are static 8x8 RGB patches during inference.
- Experiments with fewer dictionary entries (e.g., 64) and encouraging orthogonality via Gram matrix penalties.
- Dynamic LUT generation proposed to increase model capacity by outputting vectors for RGB outer products.
- Hierarchical LUTs tested for capturing both coarse and fine details by generating coefficients for a mip chain.
- Results show promising cat image generation across various LUT approaches.