Anish Sachdeva (DTU/2K16/MC/13)
Natural Language Processing - Dr. Seba Susan
📕 One Hot Vector | 📕 Term Frequency (TF) | 📕 Term Frequency - Inverse Document Frequency (TF-IDF) | ✒ Report
In many applications where we use our words as input in Machine Learning Models or in Deep Learning etc. we can't directly use our words as text and character input as machines can't perform numerical and analytical tasks directly on character sequences and perform better when given numerical tasks.
To rectify this we convert words into vectors and then use the techniques of Linear Algebra and Optimization which are readily available to us to work on our data. We can convert words into vectors using many different methods and there are already many different data sources available online that provide us with pre-computed vectors for words.
In this assignment we compute word vectors from a resume using the following techniques:
- One Hot Vectors
- Term Frequency (TF) Vectors
- Term Frequency - Inverse Document Frequency (TF-IDF) Vectors
- ⭐ One Hot Vector
- ⭐ Term Frequency (TF) Vectors
- ⭐ Term Frequency Inverse Document Frequency (TF-IDF) vectors
Clone this project on your machine and enter the src directory.
git clone https://github.com/anishLearnsToCode/bow-representation.git
cd bow-representation/src
Install Requirements:
pip install -r requirements.txt
See Vector Outputs as
python one-hot-vector.py
python tf.py
python tfidf.py
Run the Notebooks and see interactive output:
cd bow-representation/notebooks
jupyter notebook