4-bit floating point FP4

8 hours ago

Historically, floating point numbers evolved from 32-bit to 64-bit precision, with C using 'float' for 32-bit and 'double' for 64-bit, while Python uses 'float' for 64-bit.
Neural networks drive demand for lower precision floating point formats like 16-bit, 8-bit (FP8), and 4-bit (FP4) to fit more parameters in memory.
FP4 uses one bit for sign and three bits for exponent and mantissa, with common formats including E3M0, E2M1, E1M2, and E0M3, where E2M1 is widely supported.
The value of an FP4 number is calculated as (−1)^s * 2^(e−b) * (1 + m/2), with a bias (b) to allow positive and negative exponents, except when e=0.
A 16-value table for E2M1 format shows representable numbers from -6 to +6, including ±0, illustrating the distribution on log and linear scales.
The Pychop library emulates reduced-precision formats like FP4, with code provided to generate and display the E2M1 table of values.
Future discussions may cover other 4-bit formats like NF4, which better match the weight distributions in large language models (LLMs).

Hasty Briefsbeta