Speculative Sampling Explained
3 months ago
- #probability
- #machine-learning
- #sampling
- Speculative sampling uses a draft sampling to achieve the same result as target sampling.
- Target distribution is $p(x)$, draft distribution is $q(x)$.
- Tokens can be over-sampled ($q(x_i) > p(x_i)$) or under-sampled ($q(x_i) < p(x_i)$).
- Over-sampled tokens are down-sampled by accepting with probability $p(x_i)/q(x_i)$.
- Under-sampled tokens are up-sampled using a residual distribution.
- Residual distribution is defined as $r(x_i) = \frac{\max(0, p(x_i) - q(x_i))}{\sum_{x_i} \max(0, p(x_i) - q(x_i))}$.
- Rejection triggers re-sampling from the residual distribution.
- Total rejection probability equals the normalization constant of the residual distribution.
- Final sampling result recovers the target distribution $p(x)$.