Vision Banana: Image Generators Are Generalist Vision Learners
11 hours ago
- #zero-shot learning
- #computer vision
- #image analysis
- Vision Banana allows interaction with images to reveal various types of information such as segmentation, depth, and normal maps by hovering or tapping.
- The model achieves state-of-the-art performance in zero-shot transfer across 2D and 3D vision tasks.
- The work is detailed in a 2026 arXiv preprint titled 'Image Generators are Generalist Vision Learners', authored by a large team of researchers.