Speculative Sampling Explained

3 months ago

Speculative sampling uses a draft sampling to achieve the same result as target sampling.
Target distribution is $p(x)$, draft distribution is $q(x)$.
Tokens can be over-sampled ($q(x_i) > p(x_i)$) or under-sampled ($q(x_i) < p(x_i)$).
Over-sampled tokens are down-sampled by accepting with probability $p(x_i)/q(x_i)$.
Under-sampled tokens are up-sampled using a residual distribution.
Residual distribution is defined as $r(x_i) = \frac{\max(0, p(x_i) - q(x_i))}{\sum_{x_i} \max(0, p(x_i) - q(x_i))}$.
Rejection triggers re-sampling from the residual distribution.
Total rejection probability equals the normalization constant of the residual distribution.
Final sampling result recovers the target distribution $p(x)$.

Hasty Briefsbeta