Hasty Briefsbeta

Depth Anything 3

8 days ago
  • #depth-estimation
  • #computer-vision
  • #transformer
  • Depth Anything 3 (DA3) predicts spatially consistent geometry from multiple visual inputs, with or without known camera poses.
  • Key insights: a single plain transformer (e.g., vanilla DINOv2 encoder) suffices as a backbone, and a singular depth-ray prediction target eliminates complex multi-task learning.
  • Achieves detail and generalization comparable to Depth Anything 2 (DA2) through teacher-student training.
  • Establishes a new visual geometry benchmark covering camera pose estimation, any-view geometry, and visual rendering.
  • Sets a new state-of-the-art, surpassing VGGT by 35.7% in camera pose accuracy and 23.6% in geometric accuracy.
  • Outperforms DA2 in monocular depth estimation.
  • All models trained exclusively on public academic datasets.