Dummy's Guide to Modern LLM Sampling
a year ago
- #Text Generation
- #LLM
- #Sampling
- LLMs generate text by predicting the next token based on probabilities learned during training.
- Tokens are sub-words used instead of whole words or letters for efficiency and semantic understanding.
- Sampling introduces controlled randomness to avoid repetitive and deterministic outputs.
- Temperature adjusts the creativity of the model by flattening or sharpening the probability distribution.
- Presence Penalty discourages repeating any token that has appeared before.
- Frequency Penalty reduces the likelihood of tokens based on their occurrence count.
- Repetition Penalty penalizes tokens from both prompt and generated text differently based on their scores.
- DRY (Don't Repeat Yourself) prevents repetitive n-gram patterns by penalizing continuations of existing patterns.
- Top-K restricts the model to consider only the top K most likely tokens.
- Top-P selects the smallest set of tokens whose cumulative probability exceeds a threshold P.
- Min-P sets a dynamic threshold relative to the highest probability token.
- Top-A applies a squared threshold relative to the highest probability token.
- XTC (eXclude Top Choices) occasionally excludes the most likely tokens to encourage diversity.
- Top-N-Sigma uses standard deviation to set an adaptive threshold for token selection.
- Tail-Free Sampling identifies the 'elbow point' in the probability distribution to filter out the long tail.
- Eta Cutoff adjusts the threshold based on the entropy of the distribution.
- Epsilon Cutoff uses a fixed probability threshold to eliminate unlikely tokens.
- Locally Typical Sampling selects tokens based on how close their surprisal is to the average.
- Quadratic Sampling reshapes the probability distribution using quadratic and cubic transformations.
- Mirostat Sampling maintains consistent perplexity by dynamically adjusting the sampling threshold.
- Dynamic Temperature Sampling adjusts temperature based on the entropy of the distribution.
- Beam Search explores multiple paths simultaneously to find the best overall sequence.
- Contrastive Search balances likelihood and diversity by penalizing similarity to the context.
- Sampler Order affects the final output significantly, with typical pipelines applying penalties first, then temperature, and finally filtering methods.