Skip to content

Different Bag of Words representation like One Hot Vector, TF (Term frequency) & TF-IDF in NLP.

License

Notifications You must be signed in to change notification settings

anishLearnsToCode/bow-representation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BOW (Bag of Words) Representations

Anish Sachdeva (DTU/2K16/MC/13)

Natural Language Processing - Dr. Seba Susan

📕 One Hot Vector | 📕 Term Frequency (TF) | 📕 Term Frequency - Inverse Document Frequency (TF-IDF) | ✒ Report

project-image

📖 Overview

  1. Introduction
  2. 🚩 Results
  3. Running it on Your Machine
  4. Bibliography

Introduction

In many applications where we use our words as input in Machine Learning Models or in Deep Learning etc. we can't directly use our words as text and character input as machines can't perform numerical and analytical tasks directly on character sequences and perform better when given numerical tasks.

To rectify this we convert words into vectors and then use the techniques of Linear Algebra and Optimization which are readily available to us to work on our data. We can convert words into vectors using many different methods and there are already many different data sources available online that provide us with pre-computed vectors for words.

In this assignment we compute word vectors from a resume using the following techniques:

  1. One Hot Vectors
  2. Term Frequency (TF) Vectors
  3. Term Frequency - Inverse Document Frequency (TF-IDF) Vectors

🚩 Results

  1. One Hot Vector
  2. Term Frequency (TF) Vectors
  3. Term Frequency Inverse Document Frequency (TF-IDF) vectors

Running it on Your Machine

Clone this project on your machine and enter the src directory.

git clone https://github.com/anishLearnsToCode/bow-representation.git
cd bow-representation/src

Install Requirements:

pip install -r requirements.txt

See Vector Outputs as

python one-hot-vector.py
python tf.py
python tfidf.py

Run the Notebooks and see interactive output:

cd bow-representation/notebooks
jupyter notebook

Bibliography

  1. Speech & Language Processing ~Jurafsky
  2. nltk
  3. pickle
  4. pandas
  5. pandas.DataFrames
  6. Indexing and Slicing on Pandas DataFrames
  7. numpy

About

Different Bag of Words representation like One Hot Vector, TF (Term frequency) & TF-IDF in NLP.

Topics

Resources

License

Stars

Watchers

Forks