D4RT: Teaching AI to see the world in four dimensions

4 months ago

D4RT is a unified AI model for 4D scene reconstruction and tracking across space and time.
It enables machines to understand dynamic scenes from 2D videos by tracking pixels in 3D space and time.
D4RT combines scene reconstruction into a single efficient framework, improving AI perception of dynamic reality.
The model uses an encoder-decoder Transformer architecture with a flexible querying mechanism for efficiency.
D4RT outperforms previous methods, being 18x to 300x faster, processing a one-minute video in ~5 seconds.
Applications include robotics, augmented reality, and spatial computing due to its real-time capabilities.

Hasty Briefsbeta