Nvidia Cosmos 3
6 hours ago
- #Autonomous Vehicles
- #Artificial Intelligence
- #Robotics
- NVIDIA Cosmos 3 is a frontier foundation model for physical AI, unifying physical reasoning, world generation, and action generation in a single open model.
- It features a Mixture-of-Transformers architecture with a reasoner tower for multimodal interpretation and a generator tower for physics-aware video and action outputs.
- Two model sizes are available: Cosmos 3 Nano (16B parameters) for efficient, real-time inference and Cosmos 3 Super (64B parameters) for maximum quality and capability.
- The model supports diverse input and output modalities, including text, image, video, and action, for applications like robotics, autonomous vehicles, and smart spaces.
- NVIDIA is open-sourcing model checkpoints, datasets, training scripts, and deployment tools to foster reproducible physical AI development.
- Six synthetic data generation datasets covering robotics, physics, and driving are released for post-training and adaptation.
- The NVIDIA Cosmos Human Evaluation (HUE) framework provides objective fact verification for assessing video generation quality across physical AI domains.
- Cosmos 3 leads in benchmarks such as VANTAGE-Bench, PAI-Bench, and RoboLab for reasoning and generation tasks.
- Training recipes include supervised fine-tuning and action post-training for customizing the model to specific domains and embodiments.
- Deployment is optimized through NVIDIA NIM microservices with features like quantization, vLLM, and efficient video sampling for accelerated inference.