Sentiment Analysis on IMDB Movie Reviews

By Parth Mistry

Contains 50K movie reviews for natural language processing or Text analytics.
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets.
We have a set of 25,000 highly polar movie reviews for training and 25,000 for testing.
So, predict the number of positive and negative reviews using either classification or deep learning algorithms.
Here we will be using Logistic Regression to classify the reviews.

Code and Resources

Python Version: 3.7
Packages: pandas, numpy, sklearn, nltk, pickle
Dataset: IMDB Movie Reviews

Steps Performed

Transforming Documents to Feature Vectors
Checking word relevancy using TF-IDF
Calculating TF-IDF of each term
Removing noisy data
Tokenization of documents
Transforming Text Data into TF-IDF Vectors
Document Classsification using Logistic regression

Model Preparation

LogisticRegressionCV(cv=5,
                    scoring='accuracy',
                    random_state=0,
                    n_jobs=-1,
                    verbose=3,
                    max_iter=300)

Here, I used Logistic Regression on the cleaned data, and it was trained with 89% of accuracy classifying movie reviews.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
Logistic Regression.ipynb		Logistic Regression.ipynb
README.md		README.md
saved_model.sav		saved_model.sav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on IMDB Movie Reviews

Code and Resources

Steps Performed

Model Preparation

About

Releases

Packages

Languages

m-prth/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on IMDB Movie Reviews

Code and Resources

Steps Performed

Model Preparation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages