Tencent Open Sourced a 3D World Model
7 days ago
- #Video Diffusion
- #AI Research
- #3D Reconstruction
- HunyuanWorld-Voyager is a video diffusion framework that generates 3D point-cloud sequences from a single image with user-defined camera paths.
- It can produce 3D-consistent scene videos, aligned depth and RGB video for efficient 3D reconstruction.
- The framework includes two key components: World-Consistent Video Diffusion and Long-Range World Exploration.
- A scalable data engine automates camera pose estimation and depth prediction, enabling large-scale training without manual 3D annotations.
- The dataset includes over 100,000 video clips from real-world captures and synthetic Unreal Engine renders.
- Voyager outperforms other methods in metrics like WorldScore, Camera Control, and 3D Consistency.
- System requirements include an NVIDIA GPU with at least 60GB memory for 540p resolution, tested on Linux.
- Installation involves setting up a conda environment, installing dependencies, and downloading pretrained models.
- Commands are provided for generating videos with custom camera paths and prompts.
- Parallel inference is supported via xDiT for multi-GPU clusters, improving latency.
- A Gradio demo is available for interactive use, and the data engine is released for scalable training data generation.