-
Notifications
You must be signed in to change notification settings - Fork 0
Approaches for Sentiment Analysis
Where: The idea was taken from this blog.
Summary: The idea is to have a pre-trained Bert Transformer model with a simple classifier Head. We add a special token-label to Bert which we call “CLASS”. This “CLASS” label is appended to each input sentence and basically tells BERT when the sentence is done. We will use the last output of BERT which should represent the overall sentiment of the sentence, and feed it to a simple neural network which predicts the sentiment class.
Where: http://www.da.inf.ethz.ch/teaching/2020/CIL/files/exercises/exercise06.pdf Part 5 and 6
Summary: Take our dataset, filter and tokenize it according to solution code scripts, build co-occurence matrix. From that compute Glove embeddings (gives one embedding per token). For tokenization, see piazza post where i tried to reconstruct what is done: https://piazza.com/class/k6hqt5l6hyd46t?cid=107. For each sentence, apply same tokenization, go through each token, look up the computed glove embedding, average all embeddings for a sentence. On those resulting averaged embedding vectors, run some sklearn classifier.
Where: Seen at https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4. Documentation at https://textblob.readthedocs.io/en/dev/
Summary: Feed whole sentences to TextBlob classifier, gives you a polarity score between -1 (negative) and 1 (positive)
Where: Seen at https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4. Paper https://www.researchgate.net/publication/275828927_VADER_A_Parsimonious_Rule-based_Model_for_Sentiment_Analysis_of_Social_Media_Text
Summary: Feed whole sentences to VADER classifier, gives you a polarity score between -1 (negative) and 1 (positive)
Where: Code based on https://towardsdatascience.com/simple-bert-using-tensorflow-2-0-132cb19e9b22
Summary: Do additional preprocessing of the input sentences by removing punctuation and user/url tags. Input sentences tokenization gets handled via a ready-made library. Uses a pre-trained (AL)BERT transformer model, and add a classifier head, which then gives the class (neg/pos) probabilities. Added an additional Dropout layer to the NN to avoid overfitting.