Approaches for Sentiment Analysis

GPT-2 Like Model with Classifier Head

Where: The idea was taken from this blog.

Summary: The idea is to have a pre-trained Bert Transformer model with a simple classifier Head. We add a special token-label to Bert which we call “CLASS”. This “CLASS” label is appended to each input sentence and basically tells BERT when the sentence is done. We will use the last output of BERT which should represent the overall sentiment of the sentence, and feed it to a simple neural network which predicts the sentiment class.

GloveEmbeddings with sklearn SVM

Where: http://www.da.inf.ethz.ch/teaching/2020/CIL/files/exercises/exercise06.pdf Part 5 and 6

Summary: Take our dataset, filter and tokenize it according to solution code scripts, build co-occurence matrix. From that compute Glove embeddings (gives one embedding per token). For tokenization, see piazza post where i tried to reconstruct what is done: https://piazza.com/class/k6hqt5l6hyd46t?cid=107. For each sentence, apply same tokenization, go through each token, look up the computed glove embedding, average all embeddings for a sentence. On those resulting averaged embedding vectors, run some sklearn classifier.

TextBlob - Rule based classifier

Where: Seen at https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4. Documentation at https://textblob.readthedocs.io/en/dev/

Summary: Feed whole sentences to TextBlob classifier, gives you a polarity score between -1 (negative) and 1 (positive)

VADER - Rule based classifier

Where: Seen at https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4. Paper https://www.researchgate.net/publication/275828927_VADER_A_Parsimonious_Rule-based_Model_for_Sentiment_Analysis_of_Social_Media_Text

Summary: Feed whole sentences to VADER classifier, gives you a polarity score between -1 (negative) and 1 (positive)

(AL)BERT - Transformer based classifier

Where: Code based on https://towardsdatascience.com/simple-bert-using-tensorflow-2-0-132cb19e9b22

Summary: Do additional preprocessing of the input sentences by removing punctuation and user/url tags. Input sentences tokenization gets handled via a ready-made library. Uses a pre-trained (AL)BERT transformer model, and add a classifier head, which then gives the class (neg/pos) probabilities. Added an additional Dropout layer to the NN to avoid overfitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly