Project Title: Optimizing Sentiment Analysis Models for Accurate Predictions

Overview

Sentiment analysis is a technique in natural language processing (NLP) and text mining that involves analyzing and determining the emotional tone or sentiment expressed in text. This project aims to evaluate and compare the performance of various word embeddings and sequence-to-sequence (seq2seq) models for sentiment analysis of restaurant reviews. The goal is to identify the best-performing models and configurations for analyzing customer sentiment.

Dataset

The dataset used in this analysis is composed of three files: train.json, test.json, and val.json.
These files contain a total of 8,879 reviews for a single restaurant.
Each review is categorized into one of the following eight aspects: food, service, staff, price, ambience, menu, place, and miscellaneous. The sentiment associated with each review is classified as positive, negative, or neutral.

Objective

The objective of this project is to evaluate and compare the performance of various word embedding techniques and seq2seq models for sentiment analysis. The goal is to identify the best-performing combination for analyzing customer sentiment based on review text.

Analysis Approach

Text Preprocessing & Data Preparation

Clean review data by removing special characters, stop words, and irrelevant information.
Prepare word embeddings:
- GloVe: Uses pre-trained word vectors for better contextual understanding.
- Word2Vec (CBOW & Skipgram): Captures semantic relationships between words.
- FastText (CBOW & Skipgram): Enhances word representations, including out-of-vocabulary words.
Prepare seq2seq models:
- LSTM with Different Location Aspect: Incorporates location-based sentiment aspects.
- LSTM with Attention: Implements an attention mechanism to focus on relevant text portions.
- LSTM with Double Attention: Applies attention at both word and sentence levels.
Ablation Study
- RNN Model: Baseline model for comparison.
- GRU Model: Alternative to LSTM with a simpler architecture and improved training efficiency.
Interpret Results:
- Comparing model performance reveals key factors (e.g., location, attention mechanisms) that impact sentiment prediction. The double attention LSTM is expected to provide the most detailed insights into review content and sentiment drivers.
Recommendation:
- The LSTM with Double Attention should be used for sentiment analysis to offer actionable insights into customer experience. Additionally, location-based sentiment analysis can help tailor strategies for different branches.

Key Findings

After evaluating multiple word embeddings and models, the best-performing combination for sentiment analysis was found to be Word2Vec Skipgram embeddings combined with LSTM with Attention mechanism. This combination provided:

Superior word representation, capturing semantic relationships.
Enhanced contextual understanding of customer reviews.
Improved sentiment classification accuracy and interpretability.

How to run code

Install Required Libraries: Ensure all necessary libraries such as pandas, matplotlib, seaborn, tensorflow, and gensim are installed like in the file
Load the Dataset: Import the dataset by loading the train.json, test.json, and val.json files.
Run the Analysis Notebooks: Execute the analysis notebooks in Jupyter to process the data, build and train the model, and visualize the results.

Run this process in Google Colab for easy execution and visualization.

Technologies Used

Python Code: Data processing and analysis were done in Python using libraries like pandas and numpy for data manipulation, gensim for Word2Vec and FastText embeddings, and nltk for text preprocessing tasks such as tokenization, stopwords removal, and stemming. Model evaluation was carried out with scikit-learn, and deep learning models were built and trained using torch.
Visualization: For visualizing the results, matplotlib and seaborn were used for plotting, while wordcloud was utilized to generate word clouds to illustrate sentiment and aspect-wise insights.

Results & Visualizations

Figure 1: Model Validation - LSTM with Different Location Aspect Using Word2Vec SkipGram Embeddings

Figure 2: Model Validation - LSTM with Different Location Aspect Using NRR Ablation Study

Figure 3: Model Validation - LSTM with Double Attention Using Word2Vec SkipGram Embeddings

Figure 4: Model Validation - LSTM with Double Attention Using GRU Ablation Study

Figure 5: Model Validation - LSTM with Attention Using NRR Ablation Study

Figure 6: Model Validation - LSTM with Attention Using Word2Vec SkipGram Embeddings (best model)

Figure 7: Training loss over epoches and training accuracy over epochs for best model

Figure 8: Attention visualization with aspect Staff using best model

Figure 9: Attention visualization with aspect Service using best model

Noted: For details report, please go to this report

Recommendation

To improve model, we can try:

Use Transformer Models: Replace LSTM with BERT, RoBERTa, or DistilBERT for better context understanding.
Enhance Word Embeddings: Fine-tune Word2Vec, FastText, or switch to pre-trained transformer embeddings.
Fine-Tune Hyperparameters: Optimize learning rate, dropout, and batch size using Bayesian search.

Contact

📧 Email: phanchenh99@gmail.com

🔗 LinkedIn | Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Data		Data
Optimizing_Sentiment_Models.ipynb		Optimizing_Sentiment_Models.ipynb
Optimizing_Sentiment_Models_Report.pdf		Optimizing_Sentiment_Models_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title: Optimizing Sentiment Analysis Models for Accurate Predictions

Table of Contents

Overview

Dataset

Objective

Analysis Approach

Key Findings

How to run code

Technologies Used

Results & Visualizations

Recommendation

Contact

About

Releases

Packages

Languages

PhanChenh/Optimizing_Sentiment_Models_NLPproject

Folders and files

Latest commit

History

Repository files navigation

Project Title: Optimizing Sentiment Analysis Models for Accurate Predictions

Table of Contents

Overview

Dataset

Objective

Analysis Approach

Key Findings

How to run code

Technologies Used

Results & Visualizations

Recommendation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages