Dummy's Guide to Modern LLM Sampling

a year ago

#Text Generation
#LLM
#Sampling

LLMs generate text by predicting the next token based on probabilities learned during training.
Tokens are sub-words used instead of whole words or letters for efficiency and semantic understanding.
Sampling introduces controlled randomness to avoid repetitive and deterministic outputs.
Temperature adjusts the creativity of the model by flattening or sharpening the probability distribution.
Presence Penalty discourages repeating any token that has appeared before.
Frequency Penalty reduces the likelihood of tokens based on their occurrence count.
Repetition Penalty penalizes tokens from both prompt and generated text differently based on their scores.
DRY (Don't Repeat Yourself) prevents repetitive n-gram patterns by penalizing continuations of existing patterns.
Top-K restricts the model to consider only the top K most likely tokens.
Top-P selects the smallest set of tokens whose cumulative probability exceeds a threshold P.
Min-P sets a dynamic threshold relative to the highest probability token.
Top-A applies a squared threshold relative to the highest probability token.
XTC (eXclude Top Choices) occasionally excludes the most likely tokens to encourage diversity.
Top-N-Sigma uses standard deviation to set an adaptive threshold for token selection.
Tail-Free Sampling identifies the 'elbow point' in the probability distribution to filter out the long tail.
Eta Cutoff adjusts the threshold based on the entropy of the distribution.
Epsilon Cutoff uses a fixed probability threshold to eliminate unlikely tokens.
Locally Typical Sampling selects tokens based on how close their surprisal is to the average.
Quadratic Sampling reshapes the probability distribution using quadratic and cubic transformations.
Mirostat Sampling maintains consistent perplexity by dynamically adjusting the sampling threshold.
Dynamic Temperature Sampling adjusts temperature based on the entropy of the distribution.
Beam Search explores multiple paths simultaneously to find the best overall sequence.
Contrastive Search balances likelihood and diversity by penalizing similarity to the context.
Sampler Order affects the final output significantly, with typical pipelines applying penalties first, then temperature, and finally filtering methods.

Hasty Briefsbeta

Dummy's Guide to Modern LLM Sampling