DINOV3: Self-supervised learning for vision at unprecedented scale
10 days ago
- #deep learning
- #self-supervised learning
- #computer vision
- DINOv3 introduces a self-supervised learning (SSL) approach for images, creating strong universal vision backbones.
- It scales to 7B-parameter models and 1.7B image datasets with less compute than weakly-supervised methods.
- Achieves state-of-the-art performance across diverse domains without finetuning.
- Enables tasks like object detection, depth estimation, and segmentation with high-resolution dense features.
- Includes a comprehensive model suite for various use cases, including ViT sizes and efficient ConvNeXt models.
- Outperforms weakly-supervised models in tasks like fine-grained classification, semantic segmentation, and object tracking.
- Applications include tree canopy height measurement, Mars exploration, and cancer treatment predictions.
- Pre-training involves learning from unlabeled data and distilling into efficient models post-training.
- DINOv3 builds on DINOv2, increasing model size by 6x and training data by 12x.
- Evolution from DINO (80M parameters) to DINOv2 (1B parameters) to DINOv3 (larger scale).