- A word is a signifier that maps to a signified (idea or thing).
- Word representation:
- As discrete symbol:
- One-hot encoding:
- Represent words by one-hot vectors.
- Cons:
- Vector dimension = number of words in vocabulary.
- No natural notion of similarity.
- One-hot encoding:
- By the context:
- “You shall know a word by the company it keeps” (J. R. Firth 1957: 11)
- Distributed representation: Word vectors = word embeddings = word representations.
- SVD based method: Use SVD to reduce vector dimension.
- Word document matrix: focus on hidden topic (i.e LSA).
- Window based co-occurence matrix: focus on semantic and syntactic part.
- Cons:
- The dimensions of the matrix change very often.
- Extremely sparse and very high dimensional matrix in general.
- Quadratic cost to train ...
- Solution:
- Ignore stopwords
- Apply a ramp window
- Use Pearson correlation and set negative counts ...
- Iteration based method:
- Language models:
- Word2vec:
- 2 algorithms:
- Continuous bag-of-words (CBOW).
- Skip-gram.
- 2 training methods:
- Negative sampling.
- Hierarchical softmax.
- 2 algorithms:
- As discrete symbol: