Hasty Briefsbeta

Bilingual

Vision Banana: Image Generators Are Generalist Vision Learners

11 hours ago
  • #zero-shot learning
  • #computer vision
  • #image analysis
  • Vision Banana allows interaction with images to reveal various types of information such as segmentation, depth, and normal maps by hovering or tapping.
  • The model achieves state-of-the-art performance in zero-shot transfer across 2D and 3D vision tasks.
  • The work is detailed in a 2026 arXiv preprint titled 'Image Generators are Generalist Vision Learners', authored by a large team of researchers.