Hasty Briefsbeta

Bilingual

How LLMs Work

2 days ago
  • #AI Explainability
  • #Transformer Mechanisms
  • #LLM Architecture
  • LLMs convert text into tokens, which are subword pieces represented as integer IDs through tokenization.
  • Embeddings give meaning to tokens by mapping token IDs to learned vectors in a high-dimensional space.
  • Positional encoding, like Rotary Position Embeddings (RoPE), provides order information by rotating token vectors based on position.
  • Attention mechanisms allow tokens to interact by computing similarity scores between queries, keys, and values to weigh relevant information.
  • Multi-head attention runs multiple attention passes in parallel, with specialized heads for different linguistic relationships.
  • Feed-forward networks process each token independently with non-linear transformations, storing much of the model's factual knowledge.
  • Residual connections and layer normalization (e.g., RMSNorm) stabilize training in deep networks by allowing gradient flow and controlling vector scales.
  • Next-token prediction generates text by converting the final token vector into logits, applying softmax, and sampling with decoding settings like temperature.
  • Model differences arise from trained weights, configurations (e.g., number of layers, MoE), and post-training techniques like instruction tuning.
  • Modern LLMs share a transformer-based architecture, with innovations like speculative decoding improving efficiency in generation.