Hasty Briefsbeta

Bilingual

Playing with Vision Embeddings

a day ago
  • #feature visualization
  • #interpretability
  • #vision embeddings
  • DINOv3 ViT-S compresses images into 384-dimensional embeddings with minimal priors.
  • Image generation from embeddings via gradient-based optimization with augmentations.
  • Models use superposition to pack many features into limited dimensions.
  • Sparse Autoencoders (SAEs) help isolate interpretable feature directions.
  • Decomposition shows embeddings encode components like trees, fences, bridges.
  • Feature addition and interpolation reveal blending or juxtaposition of concepts.
  • Case studies illustrate nuanced features, such as single vs. multiple strawberries.
  • Feature coactivation maps visualize relationships in the embedding space.