Hasty Briefsbeta

Generating Cats with learned lookup tables

2 days ago
  • #image generation
  • #neural networks
  • #machine learning
  • Follow-up post on generating cats using lookup tables (LUT) with a dictionary of 512/64 8x8 patterns.
  • Surprisingly effective results despite initial doubts about the model's expressivity limitations.
  • Model uses a patch transformer with 16 self-attention blocks on 64 tokens per image (8x8 RGB patches).
  • Each 8x8 patch is a softmax sum over 512 learned patterns, allowing interpolation.
  • Training involves lerping to noise and predicting the original image; inference starts from Gaussian noise.
  • LUT entries are static 8x8 RGB patches during inference.
  • Experiments with fewer dictionary entries (e.g., 64) and encouraging orthogonality via Gram matrix penalties.
  • Dynamic LUT generation proposed to increase model capacity by outputting vectors for RGB outer products.
  • Hierarchical LUTs tested for capturing both coarse and fine details by generating coefficients for a mip chain.
  • Results show promising cat image generation across various LUT approaches.