Llama-Nemotron: Efficient Reasoning Models
a year ago
- #AI
- #Open Source
- #Machine Learning
- Introduction of Llama-Nemotron series: open family of heterogeneous reasoning models with exceptional capabilities and efficiency.
- Three model sizes: Nano (8B), Super (49B), Ultra (253B), competitive with state-of-the-art models like DeepSeek-R1.
- Training procedure includes neural architecture search, knowledge distillation, continued pretraining, and reasoning-focused post-training.
- First open-source models with dynamic reasoning toggle for switching between chat and reasoning modes.
- Release includes models under NVIDIA Open Model License, post-training dataset, and training codebases (NeMo, NeMo-Aligner, Megatron-LM).