Hasty Briefsbeta

A 20-Year-Old Algorithm Can Help Us Understand Transformer Embeddings

14 days ago
  • #LLM-interpretability
  • #dictionary-learning
  • #KSVD-algorithm
  • Understanding internal states of LLMs by decomposing embeddings into interpretable concept vectors.
  • Dictionary learning as a method to break down complex embeddings into simpler, interpretable elements.
  • Introduction of the superposition hypothesis by Elhage et al. in 2022 supporting the decomposition approach.
  • Use of sparse autoencoders (SAEs) by Bricken et al. in 2023 for dictionary learning in large datasets.
  • Revival of the KSVD algorithm with modifications leading to significant speed improvements (DB-KSVD).
  • Competitive performance of DB-KSVD against SAEs on the SAEBench benchmark across multiple metrics.
  • Discussion on the feasibility of dictionary learning, including factors like dataset size and embedding dimensions.
  • Potential applications of dictionary learning beyond language models, such as in robotics and vision tasks.