Introduction and Word Vectors

Keynote

A word is a signifier that maps to a signified (idea or thing).
Word representation:
- As discrete symbol:
  - One-hot encoding:
    - Represent words by one-hot vectors.
    - Cons:
      - Vector dimension = number of words in vocabulary.
      - No natural notion of similarity.
- By the context:
  - “You shall know a word by the company it keeps” (J. R. Firth 1957: 11)
  - Distributed representation: Word vectors = word embeddings = word representations.
  - SVD based method: Use SVD to reduce vector dimension.
    - Word document matrix: focus on hidden topic (i.e LSA).
    - Window based co-occurence matrix: focus on semantic and syntactic part.
    - Cons:
      - The dimensions of the matrix change very often.
      - Extremely sparse and very high dimensional matrix in general.
      - Quadratic cost to train ...
    - Solution:
      - Ignore stopwords
      - Apply a ramp window
      - Use Pearson correlation and set negative counts ...
  - Iteration based method:
    - Language models:
    - Word2vec:
      - 2 algorithms:
        
        Continuous bag-of-words (CBOW).
        
        Skip-gram.
      - 2 training methods:
        
        Negative sampling.
        
        Hierarchical softmax.