DINOV3: Self-supervised learning for vision at unprecedented scale

10 days ago

Copy Link

DINOv3 introduces a self-supervised learning (SSL) approach for images, creating strong universal vision backbones.
It scales to 7B-parameter models and 1.7B image datasets with less compute than weakly-supervised methods.
Achieves state-of-the-art performance across diverse domains without finetuning.
Enables tasks like object detection, depth estimation, and segmentation with high-resolution dense features.
Includes a comprehensive model suite for various use cases, including ViT sizes and efficient ConvNeXt models.
Outperforms weakly-supervised models in tasks like fine-grained classification, semantic segmentation, and object tracking.
Applications include tree canopy height measurement, Mars exploration, and cancer treatment predictions.
Pre-training involves learning from unlabeled data and distilling into efficient models post-training.
DINOv3 builds on DINOv2, increasing model size by 6x and training data by 12x.
Evolution from DINO (80M parameters) to DINOv2 (1B parameters) to DINOv3 (larger scale).

Hasty Briefsbeta