Tencent Open Sourced a 3D World Model

7 days ago

Copy Link

HunyuanWorld-Voyager is a video diffusion framework that generates 3D point-cloud sequences from a single image with user-defined camera paths.
It can produce 3D-consistent scene videos, aligned depth and RGB video for efficient 3D reconstruction.
The framework includes two key components: World-Consistent Video Diffusion and Long-Range World Exploration.
A scalable data engine automates camera pose estimation and depth prediction, enabling large-scale training without manual 3D annotations.
The dataset includes over 100,000 video clips from real-world captures and synthetic Unreal Engine renders.
Voyager outperforms other methods in metrics like WorldScore, Camera Control, and 3D Consistency.
System requirements include an NVIDIA GPU with at least 60GB memory for 540p resolution, tested on Linux.
Installation involves setting up a conda environment, installing dependencies, and downloading pretrained models.
Commands are provided for generating videos with custom camera paths and prompts.
Parallel inference is supported via xDiT for multi-GPU clusters, improving latency.
A Gradio demo is available for interactive use, and the data engine is released for scalable training data generation.

Hasty Briefsbeta