Sentiment analysis and emotions prediction

This repository contains package for solving sentiment prediction problem using Python and Tensorflow.

💬 Dataset

GoEmotions dataset was used for modelling and analysis. This is a human annotated dataset extracted from Reddit comments. Dataset contains 27 emotions + neutral. Follow the link to see more.

📈 Modelling

This repository contains notebooks training classifier bert_model.py based on BERT model. Models solve multiclass, multilabel classification problem. See notebook bert_model_v0.7.1.ipynb

🍳 Data preparation

High-quality data is important for good modelling. In standardize.py I created text standardization layer for tensorflow, performs cleaning of chat/comments-styled texts in English. This layer is integrated into the model (and can be used in other models), so that model works directly with raw texts. See below list of applied reformats:

fix unicode chars (unidecode lib)
replace contractions with full words
fix streched letters in words (e.g. soooooo, youuuuuuuu)
replace chat language with full phrases (e.g. lol, asap)
remove placeholders used in GoEmotions dataset (e.g. [NAME], [RELIGION])
remove words containing numbers
remove /r tags used in reddit comments (GoEmotions source)
remove all charactes except for letters, some punctuation and hyphen
replace duplicated punctuation with a single char
propper-set punctuation without space before and with 1 space after
remove multiple spaces and trim
convert to lowercase

⚠️ Please note that this layer has a dependency of unidecode lib

🛠️ Tools

In utils.py you can find various handy functions used for modelling and analysis. You can use notebooks as examples to develop your own models. See requirements.txt for list of required libs.

✅ Results

Using small_bert/bert_en_uncased_L-2_H-128_A-2 classifier reached the following metrics in emotions prediction:
F1-Score (micro): 0.5835
F1-Score (macro): 0.5070
and in sentiments prediction:
F1-Score (micro): 0.7760
F1-Score (macro): 0.7349

See examples below

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
sentiment		sentiment
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bert_model_v0.7.1.ipynb		bert_model_v0.7.1.ipynb
example.png		example.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis and emotions prediction

💬 Dataset

📈 Modelling

🍳 Data preparation

🛠️ Tools

✅ Results

About

Languages

License

Shoomaher/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis and emotions prediction

💬 Dataset

📈 Modelling

🍳 Data preparation

🛠️ Tools

✅ Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages