This repository contains the code for a sentiment analysis project on Reddit comments, conducted by Sai Sankeerth Thallapally (U00861167) & Rahul Penchala (U00869508). The project aims to classify the sentiment of comments as either positive or negative using machine learning techniques. It includes three Python script files for different phases of the project and links to the required datasets.
EXPERIMENT-1.ipynb
: Jupyter notebook for the first experiment.EXPERIMENT-2.ipynb
: Jupyter notebook for the second experimental setup.Final code.ipynb
: Jupyter notebook containing the final sentiment analysis model.
The large dataset and Word2Vec model files are not included in this repository, but can be downloaded from the following links:
-
Reddit Comments Dataset (May 2015):
- Description: Contains Reddit comments from May 2015.
- Size: Approximately 21 GB.
- Download Link: Reddit Comments May 2015 - Kaggle
-
GoogleNews-vectors-negative300.bin:
- Description: Pre-trained Word2Vec model from Google News.
- Size: About 3 GB.
- Download Link: GoogleNews Vectors Negative300 - Kaggle
After downloading, place the database.sqlite
(extracted from the Reddit Comments Dataset) and GoogleNews-vectors-negative300.bin
files in the project directory alongside the Jupyter notebooks.
-
To ensure the smooth execution of the notebooks, it's recommended to download the entire project repository as a ZIP file. After downloading, unzip the file to extract its contents into a single directory. This approach helps maintain the file structure and relative paths, making it easier to run the code without encountering file path issues.
-
Once you've downloaded and extracted the files, follow the instructions under the "How to Run the Code" section. Make sure that all prerequisite software and libraries are installed, and that the data files (
database.sqlite
andGoogleNews-vectors-negative300.bin
) are placed in the same directory as the Jupyter notebooks. -
By following these steps, you'll be able to run the notebooks seamlessly and replicate the project's results.
- Python 3.x
- Jupyter Notebook
- Libraries: pandas, nltk, textblob, gensim, keras, sklearn, imblearn, sqlite3
-
Install Python: Download from Python's official website.
-
Install Jupyter Notebook:
pip install notebook
- Install Libraries:
pip install pandas nltk textblob gensim keras sklearn imblearn sqlite3
- Open Terminal (macOS/Linux) or PowerShell (Windows).
- Navigate to the project directory with the
.ipynb
files. - Launch Jupyter Notebook:
jupyter notebook
- Open the desired notebook from the Jupyter browser interface.
- Run the notebook cells by pressing Shift + Enter, or use "Run All" in the toolbar.
- Verify all libraries are installed.
- Restart the Jupyter notebook kernel if necessary.
- Check the file paths to
database.sqlite
andGoogleNews-vectors-negative300.bin
.
For any queries related to this project, please contact
- Sai Sankeerth Thallapally at (sthllpll@memphis.edu)
- Rahul Penchala at (rpnchala@memphis.edu)