Playing with Vision Embeddings
a day ago
- #feature visualization
- #interpretability
- #vision embeddings
- DINOv3 ViT-S compresses images into 384-dimensional embeddings with minimal priors.
- Image generation from embeddings via gradient-based optimization with augmentations.
- Models use superposition to pack many features into limited dimensions.
- Sparse Autoencoders (SAEs) help isolate interpretable feature directions.
- Decomposition shows embeddings encode components like trees, fences, bridges.
- Feature addition and interpolation reveal blending or juxtaposition of concepts.
- Case studies illustrate nuanced features, such as single vs. multiple strawberries.
- Feature coactivation maps visualize relationships in the embedding space.