Skip to content

Continuous Bag ๐Ÿ’ผ of Words Model to create Word embeddings for a word from a given Corpus ๐Ÿ“š and then Performing PCA โ†• and Reational Mapping using the Embeddings.

License

Notifications You must be signed in to change notification settings

anishLearnsToCode/word-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Creating Word Embeddings Using The CBoW Model

CBoW (Continuous Bag of Words Model)

The CBoW model architecture tries to predict the current target word (the center word) based on the source context words (surrounding words). Considering a simple sentence, "the quick brown fox jumps over the lazy dogโ€, this can be pairs of (context_window, target_word) where if we consider a context window of size 2, we have examples like ([quick, fox], brown), ([the, brown], quick), ([the, dog], lazy) and so on. Thus the model tries to predict the target_word based on the context_window words.

model

Implementation

We will first introduce the Continuous Bag of Words (CBoW) Model in this Jupyter Notebook and then implement it on a small dataset consisting of textual data from Shakespeare Novels and create word embeddings for a few words in this Notebook.

model-architecture

We will then use pre-trained word embeddings from the standard word2vc implementation by Google and show how we can perform PCA (Principal Component Analysis) on our word embeddings. We also show how to perform logical comparisons and Language Translation using word embeddings in this Notebook.

Overview

The following steps have been followed in the overall pipeline:

  1. Generating the Word Embeddings Using CBoW Model: An Overview
  2. Generating Word Embeddings from a Corpus
  3. Performing PCA (Principal Component Analysis on Word Embeddings)

Further Reading

  1. Speech and Language Processing ~Jurafsky
  2. word2vec ~Wikipedia
  3. word2vec ~Google

About

Continuous Bag ๐Ÿ’ผ of Words Model to create Word embeddings for a word from a given Corpus ๐Ÿ“š and then Performing PCA โ†• and Reational Mapping using the Embeddings.

Topics

Resources

License

Stars

Watchers

Forks