Hasty Briefsbeta

Defeating Nondeterminism in LLM Inference

20 hours ago
  • #inference
  • #LLM
  • #determinism
  • LLM inference is nondeterministic due to floating-point non-associativity and batch-size variations.
  • Floating-point non-associativity causes numerical differences when operations are performed in different orders.
  • Batch-size variations in inference servers lead to nondeterministic results because kernels are not batch-invariant.
  • Achieving deterministic LLM inference requires batch-invariant kernels for operations like RMSNorm, matrix multiplication, and attention.
  • Batch-invariant attention requires consistent reduction order regardless of how tokens are processed.
  • Deterministic inference enables true on-policy reinforcement learning by ensuring identical results between training and sampling.
  • Performance impact of deterministic kernels is manageable, with optimizations possible for attention kernels.
  • The community is encouraged to address nondeterminism in ML systems for reproducibility and reliability.