Hasty Briefsbeta

Bilingual

Nvidia Cosmos 3

4 hours ago
  • #Autonomous Vehicles
  • #Artificial Intelligence
  • #Robotics
  • NVIDIA Cosmos 3 is a frontier foundation model for physical AI, unifying physical reasoning, world generation, and action generation in a single open model.
  • It features a Mixture-of-Transformers architecture with a reasoner tower for multimodal interpretation and a generator tower for physics-aware video and action outputs.
  • Two model sizes are available: Cosmos 3 Nano (16B parameters) for efficient, real-time inference and Cosmos 3 Super (64B parameters) for maximum quality and capability.
  • The model supports diverse input and output modalities, including text, image, video, and action, for applications like robotics, autonomous vehicles, and smart spaces.
  • NVIDIA is open-sourcing model checkpoints, datasets, training scripts, and deployment tools to foster reproducible physical AI development.
  • Six synthetic data generation datasets covering robotics, physics, and driving are released for post-training and adaptation.
  • The NVIDIA Cosmos Human Evaluation (HUE) framework provides objective fact verification for assessing video generation quality across physical AI domains.
  • Cosmos 3 leads in benchmarks such as VANTAGE-Bench, PAI-Bench, and RoboLab for reasoning and generation tasks.
  • Training recipes include supervised fine-tuning and action post-training for customizing the model to specific domains and embodiments.
  • Deployment is optimized through NVIDIA NIM microservices with features like quantization, vLLM, and efficient video sampling for accelerated inference.