Nvidia Nemotron 3 Ultra

8 hours ago

Nemotron 3 Ultra is NVIDIA's most capable model with 550B total and 55B active parameters.
Uses a Mixture-of-Experts Hybrid Mamba-Attention architecture, LatentMoE for accuracy, and MTP layers for faster inference.
Pretrained in NVFP4 and post-trained with SFT, RL, and Multi-teacher On-Policy Distillation for improved accuracy.
Achieves up to 5.9x higher inference throughput compared to other models and supports up to 1M token context length.
Open-source release includes pre-trained, post-trained, quantized checkpoints, and datasets for training.

Hasty Briefsbeta