Hasty Briefsbeta

Bilingual

Speculative Sampling Explained

3 months ago
  • #probability
  • #machine-learning
  • #sampling
  • Speculative sampling uses a draft sampling to achieve the same result as target sampling.
  • Target distribution is $p(x)$, draft distribution is $q(x)$.
  • Tokens can be over-sampled ($q(x_i) > p(x_i)$) or under-sampled ($q(x_i) < p(x_i)$).
  • Over-sampled tokens are down-sampled by accepting with probability $p(x_i)/q(x_i)$.
  • Under-sampled tokens are up-sampled using a residual distribution.
  • Residual distribution is defined as $r(x_i) = \frac{\max(0, p(x_i) - q(x_i))}{\sum_{x_i} \max(0, p(x_i) - q(x_i))}$.
  • Rejection triggers re-sampling from the residual distribution.
  • Total rejection probability equals the normalization constant of the residual distribution.
  • Final sampling result recovers the target distribution $p(x)$.