Hasty Briefsbeta

Semantic Structure in Large Language Model Embeddings

12 days ago
  • #LLM embeddings
  • #semantic structure
  • #low-dimensional
  • Human ratings of words across semantic scales can be reduced to a low-dimensional form with minimal information loss.
  • LLM embeddings show similar semantic structure, with projections on antonym-pair-defined directions correlating highly with human ratings.
  • Semantic features in LLMs reduce to a 3D subspace, resembling patterns from human survey responses.
  • Shifting tokens along semantic directions causes off-target effects proportional to cosine similarity of aligned features.
  • Semantic features in LLMs are entangled similarly to human language, and much of semantic information is surprisingly low-dimensional.
  • Accounting for semantic structure is essential to avoid unintended consequences when steering features in LLMs.