Hasty Briefsbeta

Bilingual

Dummy's Guide to Modern LLM Sampling

a year ago
  • #Text Generation
  • #LLM
  • #Sampling
  • LLMs generate text by predicting the next token based on probabilities learned during training.
  • Tokens are sub-words used instead of whole words or letters for efficiency and semantic understanding.
  • Sampling introduces controlled randomness to avoid repetitive and deterministic outputs.
  • Temperature adjusts the creativity of the model by flattening or sharpening the probability distribution.
  • Presence Penalty discourages repeating any token that has appeared before.
  • Frequency Penalty reduces the likelihood of tokens based on their occurrence count.
  • Repetition Penalty penalizes tokens from both prompt and generated text differently based on their scores.
  • DRY (Don't Repeat Yourself) prevents repetitive n-gram patterns by penalizing continuations of existing patterns.
  • Top-K restricts the model to consider only the top K most likely tokens.
  • Top-P selects the smallest set of tokens whose cumulative probability exceeds a threshold P.
  • Min-P sets a dynamic threshold relative to the highest probability token.
  • Top-A applies a squared threshold relative to the highest probability token.
  • XTC (eXclude Top Choices) occasionally excludes the most likely tokens to encourage diversity.
  • Top-N-Sigma uses standard deviation to set an adaptive threshold for token selection.
  • Tail-Free Sampling identifies the 'elbow point' in the probability distribution to filter out the long tail.
  • Eta Cutoff adjusts the threshold based on the entropy of the distribution.
  • Epsilon Cutoff uses a fixed probability threshold to eliminate unlikely tokens.
  • Locally Typical Sampling selects tokens based on how close their surprisal is to the average.
  • Quadratic Sampling reshapes the probability distribution using quadratic and cubic transformations.
  • Mirostat Sampling maintains consistent perplexity by dynamically adjusting the sampling threshold.
  • Dynamic Temperature Sampling adjusts temperature based on the entropy of the distribution.
  • Beam Search explores multiple paths simultaneously to find the best overall sequence.
  • Contrastive Search balances likelihood and diversity by penalizing similarity to the context.
  • Sampler Order affects the final output significantly, with typical pipelines applying penalties first, then temperature, and finally filtering methods.