4-bit floating point FP4
6 hours ago
- #Low-Precision Computing
- #Neural Networks
- #Floating Point Precision
- Historically, floating point numbers evolved from 32-bit to 64-bit precision, with C using 'float' for 32-bit and 'double' for 64-bit, while Python uses 'float' for 64-bit.
- Neural networks drive demand for lower precision floating point formats like 16-bit, 8-bit (FP8), and 4-bit (FP4) to fit more parameters in memory.
- FP4 uses one bit for sign and three bits for exponent and mantissa, with common formats including E3M0, E2M1, E1M2, and E0M3, where E2M1 is widely supported.
- The value of an FP4 number is calculated as (−1)^s * 2^(e−b) * (1 + m/2), with a bias (b) to allow positive and negative exponents, except when e=0.
- A 16-value table for E2M1 format shows representable numbers from -6 to +6, including ±0, illustrating the distribution on log and linear scales.
- The Pychop library emulates reduced-precision formats like FP4, with code provided to generate and display the E2M1 table of values.
- Future discussions may cover other 4-bit formats like NF4, which better match the weight distributions in large language models (LLMs).