Skip to content

Latest commit

 

History

History
43 lines (39 loc) · 2.13 KB

README.md

File metadata and controls

43 lines (39 loc) · 2.13 KB

Introduction and Word Vectors

Keynote

  • A word is a signifier that maps to a signified (idea or thing).
  • Word representation:
    • As discrete symbol:
      • One-hot encoding:
        • Represent words by one-hot vectors.
        • Cons:
          • Vector dimension = number of words in vocabulary.
          • No natural notion of similarity.
    • By the context:
      • “You shall know a word by the company it keeps” (J. R. Firth 1957: 11)
      • Distributed representation: Word vectors = word embeddings = word representations.
      • SVD based method: Use SVD to reduce vector dimension.
        • Word document matrix: focus on hidden topic (i.e LSA).
        • Window based co-occurence matrix: focus on semantic and syntactic part.
        • Cons:
          • The dimensions of the matrix change very often.
          • Extremely sparse and very high dimensional matrix in general.
          • Quadratic cost to train ...
        • Solution:
          • Ignore stopwords
          • Apply a ramp window
          • Use Pearson correlation and set negative counts ...
      • Iteration based method:
        • Language models:
        • Word2vec:
          • 2 algorithms:
            • Continuous bag-of-words (CBOW).
            • Skip-gram.
          • 2 training methods:
            • Negative sampling.
            • Hierarchical softmax.