Generating Cats with learned lookup tables

2 days ago

Copy Link

Follow-up post on generating cats using lookup tables (LUT) with a dictionary of 512/64 8x8 patterns.
Surprisingly effective results despite initial doubts about the model's expressivity limitations.
Model uses a patch transformer with 16 self-attention blocks on 64 tokens per image (8x8 RGB patches).
Each 8x8 patch is a softmax sum over 512 learned patterns, allowing interpolation.
Training involves lerping to noise and predicting the original image; inference starts from Gaussian noise.
LUT entries are static 8x8 RGB patches during inference.
Experiments with fewer dictionary entries (e.g., 64) and encouraging orthogonality via Gram matrix penalties.
Dynamic LUT generation proposed to increase model capacity by outputting vectors for RGB outer products.
Hierarchical LUTs tested for capturing both coarse and fine details by generating coefficients for a mip chain.
Results show promising cat image generation across various LUT approaches.

Hasty Briefsbeta