IMDb-Movie-Review

Sentiment Analysis on IMDb Movie Reviews.

IMDb - IMDb (Internet Movie Database) is an online database of information related to films, television programs, home videos and video games, and internet streams, including cast, production crew and personnel biographies, plot summaries, trivia, and fan reviews and ratings.

This is a Kaggle Competition: Bag of Words Meets Bags of Popcorn.

Dependencies

You can install dependencies by running the following command in Anaocnda prompt:

# Theano
conda install mingw libpython
conda install mkl=2017.0.3

# Keras
pip install keras

We also need NLTK(Natural Language ToolKit) package. It is already installed in Anaconda.

If it is not installed, you can install it by running the commands in Anaconda prompt:

# NLTK
conda install -c conda-forge nltk

After downloading NLTK package, we need to download NLTK dataset.

import nltk
nltk.download()

This window will pop-up.

Download All Packages.

Dataset

There are Two datasets - 1) labeledTrainData 2) testData.
These datasets have been downloaded from Kaggle Competition - Bags of Words Meets Bags Of Popcorn.

LabeledTrainData has 25000 rows containing 3 columns - id, Sentiment, review.
TestData has 25000 rows containing only 2 columns - id, and reviews. We have to predict the sentiments of these reviews.

Sentiment Analysis

Sentiment Analysis of IMDb Movie datasets is done using two different machine learning algorithm:

Random forest
Recurrent Neural Network.

First, we trained the model using Random Forest. The score on kaggle comes out to be 0.84176.

We also trained the model on LSTM and GRU Recurrent Neural Network, using different preprocessing techniques, like Porter stemming, Stop words etc. It gives training accuracy in range of 91.57 to 92.76, and score on Kaggle comes out in the range of 0.86768 to 0.87896.

The highest score on Kaggle comes out to be 0.87896 using Recurrent Neural Network LSTM out of different algorithms and various pre-processing techniques.

How to work with the code

Sentiment Analysis Using Bags of Words - Random Forest

Change the directory, in read_csv(), to location of your labeledTrainData.tsv.
Change the directory, in read_csv(), to location of your testData.tsv.
Run the file.

Sentiment Analysis Using RNN - Recurrent Neural Network.

Change the directory of data_train and data_test of ''Sentiment Analysis using RNN' to the location of respective dataset.
Run the file.
First, it will ask for the input for methods of preprocessing the data - which are - Porter Stemming, Stop Wrods, or Neither of them. Accordingly, it will process the data.
Then, it will ask for the input for model - LSTM RNN or GRU RNN.
Compile the model, and it will create a csv file for the predicted sentiment of test data.
Now, to predict your own review, run 'Predict Class For IMDb Movie Review.py'.
It will ask for which model to use, which methods of preprocessing to use, and then it will predict the sentiment of the review.

Output

It will create a CSV file of predicted data for Kaggle submission, containing two columns: id, and sentiment.
id will be the column "id" from testdata, and sentiment will be the predicted value from the model.

To know more about Recurrent Neural Network, check this course.

Read more about Sentiment Analysis using Deep Learning methods in this paper by Lei Zhang(LinkedIn Corporation), Shuai Wang and Bing Liu.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
Predict Class For IMDb Movie Review.py		Predict Class For IMDb Movie Review.py
README.md		README.md
Sentiment Analysis using RNN.py		Sentiment Analysis using RNN.py
Sentimental Analysis Using Bags Of Words.py		Sentimental Analysis Using Bags Of Words.py
labeledTrainData.tsv		labeledTrainData.tsv
testData.tsv		testData.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDb-Movie-Review

Dependencies

Dataset

Sentiment Analysis

How to work with the code

Sentiment Analysis Using Bags of Words - Random Forest

Sentiment Analysis Using RNN - Recurrent Neural Network.

Output

About

Releases

Packages

Languages

MohammadWasil/Sentiment-Analysis-IMDb-Movie-Review

Folders and files

Latest commit

History

Repository files navigation

IMDb-Movie-Review

Dependencies

Dataset

Sentiment Analysis

How to work with the code

Sentiment Analysis Using Bags of Words - Random Forest

Sentiment Analysis Using RNN - Recurrent Neural Network.

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages