Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
2 days ago
- #transformer-architecture
- #4D-reconstruction
- #computer-vision
- D4RT is a feedforward model designed for reconstructing dynamic scenes from video.
- It uses a unified transformer architecture to infer depth, spatio-temporal correspondence, and camera parameters.
- The model features a novel querying mechanism for efficient 3D position probing in space and time.
- D4RT achieves state-of-the-art performance in 4D reconstruction tasks with lightweight and scalable training.
- The architecture includes a global self-attention encoder and a lightweight decoder for flexible scene representation.
- Capabilities include 3D tracking, 3D reconstruction, and all-pixels tracking for holistic scene reconstruction.
- The project was led by MS with contributions from multiple authors in model design, implementation, and evaluation.
- Acknowledgments include colleagues and advisors who provided feedback, support, and resources.