Hasty Briefsbeta

King – man and woman is queen; but why?

22 days ago
  • #natural-language-processing
  • #word2vec
  • #machine-learning
  • word2vec transforms words into vectors, allowing analogies like 'king - man + woman = queen'.
  • The algorithm relies on word co-occurrences and the distributional hypothesis: words are characterized by their context.
  • Pointwise Mutual Information (PMI) is used to measure how much more likely word pairs appear together than by chance.
  • Word vectors form a linear space where similar words are close, enabling analogies through vector arithmetic.
  • Analogies in word2vec can represent meaning (gender changes), grammar (tense changes), or other relationships.
  • Differences between word vectors (e.g., 'woman - man') can reveal semantic relationships like gender.
  • Pre-trained word vectors and tools like GloVe and TensorFlow's Embedding Projector allow exploration and visualization.
  • Technical details include two sets of vectors per word (word and context) and the use of PPMI for practical datasets.
  • Word embeddings can reflect biases present in the training data, such as 'doctor - man + woman = nurse'.
  • Resources for further learning include TensorFlow tutorials, GloVe, and critiques of word2vec assumptions.