Hasty Briefsbeta

  • #deep-learning
  • #computer-vision
  • #transformers
  • DINOv3 backbones are now available on Hugging Face Hub and supported by the Hugging Face Transformers library.
  • DINOv3 models produce high-quality dense features and perform well on various vision tasks without fine-tuning.
  • Models include ViT and ConvNeXt architectures pretrained on datasets like LVD-1689M (web images) and SAT-493M (satellite imagery).
  • Instructions provided for loading models via PyTorch Hub and Hugging Face Transformers.
  • Example code snippets demonstrate how to extract image embeddings using DINOv3 models.
  • Training and evaluation setup requires PyTorch >= 2.7.1 and specific dependencies.
  • Notebooks available for tasks like PCA of patch features, foreground segmentation, and dense/sparse matching.
  • Training DINOv3 involves stages like pretraining, gram anchoring, and high-resolution adaptation.
  • DINOv3 code and model weights are released under the DINOv3 License.
  • Citation provided for referencing the DINOv3 paper.