Depth Anything 3

8 days ago

Copy Link

Depth Anything 3 (DA3) predicts spatially consistent geometry from multiple visual inputs, with or without known camera poses.
Key insights: a single plain transformer (e.g., vanilla DINOv2 encoder) suffices as a backbone, and a singular depth-ray prediction target eliminates complex multi-task learning.
Achieves detail and generalization comparable to Depth Anything 2 (DA2) through teacher-student training.
Establishes a new visual geometry benchmark covering camera pose estimation, any-view geometry, and visual rendering.
Sets a new state-of-the-art, surpassing VGGT by 35.7% in camera pose accuracy and 23.6% in geometric accuracy.
Outperforms DA2 in monocular depth estimation.
All models trained exclusively on public academic datasets.

Hasty Briefsbeta