DINOv3
9 days ago
- #deep-learning
- #computer-vision
- #transformers
- DINOv3 backbones are now available on Hugging Face Hub and supported by the Hugging Face Transformers library.
- DINOv3 models produce high-quality dense features and perform well on various vision tasks without fine-tuning.
- Models include ViT and ConvNeXt architectures pretrained on datasets like LVD-1689M (web images) and SAT-493M (satellite imagery).
- Instructions provided for loading models via PyTorch Hub and Hugging Face Transformers.
- Example code snippets demonstrate how to extract image embeddings using DINOv3 models.
- Training and evaluation setup requires PyTorch >= 2.7.1 and specific dependencies.
- Notebooks available for tasks like PCA of patch features, foreground segmentation, and dense/sparse matching.
- Training DINOv3 involves stages like pretraining, gram anchoring, and high-resolution adaptation.
- DINOv3 code and model weights are released under the DINOv3 License.
- Citation provided for referencing the DINOv3 paper.