King – man and woman is queen; but why?
2 months ago
- #natural-language-processing
- #word2vec
- #machine-learning
- word2vec transforms words into vectors, allowing analogies like 'king - man + woman = queen'.
- The algorithm relies on word co-occurrences and the distributional hypothesis: words are characterized by their context.
- Pointwise Mutual Information (PMI) is used to measure how much more likely word pairs appear together than by chance.
- Word vectors form a linear space where similar words are close, enabling analogies through vector arithmetic.
- Analogies in word2vec can represent meaning (gender changes), grammar (tense changes), or other relationships.
- Differences between word vectors (e.g., 'woman - man') can reveal semantic relationships like gender.
- Pre-trained word vectors and tools like GloVe and TensorFlow's Embedding Projector allow exploration and visualization.
- Technical details include two sets of vectors per word (word and context) and the use of PPMI for practical datasets.
- Word embeddings can reflect biases present in the training data, such as 'doctor - man + woman = nurse'.
- Resources for further learning include TensorFlow tutorials, GloVe, and critiques of word2vec assumptions.