DINOv3

9 days ago

Copy Link

DINOv3 backbones are now available on Hugging Face Hub and supported by the Hugging Face Transformers library.
DINOv3 models produce high-quality dense features and perform well on various vision tasks without fine-tuning.
Models include ViT and ConvNeXt architectures pretrained on datasets like LVD-1689M (web images) and SAT-493M (satellite imagery).
Instructions provided for loading models via PyTorch Hub and Hugging Face Transformers.
Example code snippets demonstrate how to extract image embeddings using DINOv3 models.
Training and evaluation setup requires PyTorch >= 2.7.1 and specific dependencies.
Notebooks available for tasks like PCA of patch features, foreground segmentation, and dense/sparse matching.
Training DINOv3 involves stages like pretraining, gram anchoring, and high-resolution adaptation.
DINOv3 code and model weights are released under the DINOv3 License.
Citation provided for referencing the DINOv3 paper.

Hasty Briefsbeta