Nvidia Nemotron 3 Ultra
9 hours ago
- #LLM
- #NVIDIA
- #AI Research
- Nemotron 3 Ultra is NVIDIA's most capable model with 550B total and 55B active parameters.
- Uses a Mixture-of-Experts Hybrid Mamba-Attention architecture, LatentMoE for accuracy, and MTP layers for faster inference.
- Pretrained in NVFP4 and post-trained with SFT, RL, and Multi-teacher On-Policy Distillation for improved accuracy.
- Achieves up to 5.9x higher inference throughput compared to other models and supports up to 1M token context length.
- Open-source release includes pre-trained, post-trained, quantized checkpoints, and datasets for training.