Semantic Structure in Large Language Model Embeddings
12 days ago
- #LLM embeddings
- #semantic structure
- #low-dimensional
- Human ratings of words across semantic scales can be reduced to a low-dimensional form with minimal information loss.
- LLM embeddings show similar semantic structure, with projections on antonym-pair-defined directions correlating highly with human ratings.
- Semantic features in LLMs reduce to a 3D subspace, resembling patterns from human survey responses.
- Shifting tokens along semantic directions causes off-target effects proportional to cosine similarity of aligned features.
- Semantic features in LLMs are entangled similarly to human language, and much of semantic information is surprisingly low-dimensional.
- Accounting for semantic structure is essential to avoid unintended consequences when steering features in LLMs.