Hasty Briefsbeta

DINOV3: Self-supervised learning for vision at unprecedented scale

9 days ago
  • #deep learning
  • #self-supervised learning
  • #computer vision
  • DINOv3 introduces a self-supervised learning (SSL) approach for images, creating strong universal vision backbones.
  • It scales to 7B-parameter models and 1.7B image datasets with less compute than weakly-supervised methods.
  • Achieves state-of-the-art performance across diverse domains without finetuning.
  • Enables tasks like object detection, depth estimation, and segmentation with high-resolution dense features.
  • Includes a comprehensive model suite for various use cases, including ViT sizes and efficient ConvNeXt models.
  • Outperforms weakly-supervised models in tasks like fine-grained classification, semantic segmentation, and object tracking.
  • Applications include tree canopy height measurement, Mars exploration, and cancer treatment predictions.
  • Pre-training involves learning from unlabeled data and distilling into efficient models post-training.
  • DINOv3 builds on DINOv2, increasing model size by 6x and training data by 12x.
  • Evolution from DINO (80M parameters) to DINOv2 (1B parameters) to DINOv3 (larger scale).