Nvidia Cosmos 3

6 hours ago

NVIDIA Cosmos 3 is a frontier foundation model for physical AI, unifying physical reasoning, world generation, and action generation in a single open model.
It features a Mixture-of-Transformers architecture with a reasoner tower for multimodal interpretation and a generator tower for physics-aware video and action outputs.
Two model sizes are available: Cosmos 3 Nano (16B parameters) for efficient, real-time inference and Cosmos 3 Super (64B parameters) for maximum quality and capability.
The model supports diverse input and output modalities, including text, image, video, and action, for applications like robotics, autonomous vehicles, and smart spaces.
NVIDIA is open-sourcing model checkpoints, datasets, training scripts, and deployment tools to foster reproducible physical AI development.
Six synthetic data generation datasets covering robotics, physics, and driving are released for post-training and adaptation.
The NVIDIA Cosmos Human Evaluation (HUE) framework provides objective fact verification for assessing video generation quality across physical AI domains.
Cosmos 3 leads in benchmarks such as VANTAGE-Bench, PAI-Bench, and RoboLab for reasoning and generation tasks.
Training recipes include supervised fine-tuning and action post-training for customizing the model to specific domains and embodiments.
Deployment is optimized through NVIDIA NIM microservices with features like quantization, vLLM, and efficient video sampling for accelerated inference.

Hasty Briefsbeta