Hasty Briefsbeta

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

2 days ago
  • #transformer-architecture
  • #4D-reconstruction
  • #computer-vision
  • D4RT is a feedforward model designed for reconstructing dynamic scenes from video.
  • It uses a unified transformer architecture to infer depth, spatio-temporal correspondence, and camera parameters.
  • The model features a novel querying mechanism for efficient 3D position probing in space and time.
  • D4RT achieves state-of-the-art performance in 4D reconstruction tasks with lightweight and scalable training.
  • The architecture includes a global self-attention encoder and a lightweight decoder for flexible scene representation.
  • Capabilities include 3D tracking, 3D reconstruction, and all-pixels tracking for holistic scene reconstruction.
  • The project was led by MS with contributions from multiple authors in model design, implementation, and evaluation.
  • Acknowledgments include colleagues and advisors who provided feedback, support, and resources.