Hasty Briefsbeta

Tencent Open Sourced a 3D World Model

7 days ago
  • #Video Diffusion
  • #AI Research
  • #3D Reconstruction
  • HunyuanWorld-Voyager is a video diffusion framework that generates 3D point-cloud sequences from a single image with user-defined camera paths.
  • It can produce 3D-consistent scene videos, aligned depth and RGB video for efficient 3D reconstruction.
  • The framework includes two key components: World-Consistent Video Diffusion and Long-Range World Exploration.
  • A scalable data engine automates camera pose estimation and depth prediction, enabling large-scale training without manual 3D annotations.
  • The dataset includes over 100,000 video clips from real-world captures and synthetic Unreal Engine renders.
  • Voyager outperforms other methods in metrics like WorldScore, Camera Control, and 3D Consistency.
  • System requirements include an NVIDIA GPU with at least 60GB memory for 540p resolution, tested on Linux.
  • Installation involves setting up a conda environment, installing dependencies, and downloading pretrained models.
  • Commands are provided for generating videos with custom camera paths and prompts.
  • Parallel inference is supported via xDiT for multi-GPU clusters, improving latency.
  • A Gradio demo is available for interactive use, and the data engine is released for scalable training data generation.