Nvidia trains 10T model in 4 bit precision (NVFP4)
15 days ago
- #AI
- #NVIDIA
- #Quantization
- AI workloads have grown exponentially, especially in deploying large language models (LLMs) and processing tokens during pretraining and post-training.
- NVIDIA's NVFP4, a 4-bit format, enhances inference latency, throughput, and efficiency while maintaining accuracy.
- NVFP4 is now being extended to pretraining, offering significant improvements in training efficiency and scalability.
- 4-bit quantization reduces model weights and activations to 4 bits, requiring specialized techniques to maintain accuracy.
- NVFP4's pretraining recipe includes micro-block scaling, high-precision block encoding, tensor reshaping, and stochastic rounding to ensure stability and accuracy.
- Experiments show NVFP4 matches FP8 performance in large-scale pretraining, validating its effectiveness for trillion-token models.
- NVFP4 enables AI factories to scale more efficiently, reducing power and compute costs while accelerating model development.