-
Notifications
You must be signed in to change notification settings - Fork 0
AncaElena10/Humor-Detection
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
I have provided a solution for humor detection in binary data. Features extracted: -> remove word with length less than 3 -> text cleaning (bring the words to their 'complete' form; that means removing the words with aportrofe (like haven't, it will become have not) -> stop word (but just a few of them, I don't want to extract all the stop words because some of them can give the text a negative sentiment like 'not' or 'no' and if I remove them, then the sentence will automatically become a positive one, which is wrong. -> punctuation -> stem words -> lemm words -> n-grams (2, 3 words) Algorithms used: -> Logistic Regression -> Naive Bayes (Bernoulli, Multinomial) -> SVM (LinearSVC, SVC (with kernel = linear or rbf), NuSVC (with kernel = linear or rbf)) -> DecisionTree -> RandomForest -> KNeighbors The best values of C (LogisticRegression), random_state (DecisionTree, RandomForest), nu (NuSVC), n_neighbors (KNeighbors) were chosen with GridSearchCV. clean sw punct stem lemm ngrams all C 1000 100 1000 100 1000 100 0.001 nu*(1) 0.3 0.3 0.3 0.3 0.3 0.7 0.7 nu*(2) 0.3 0.3 0.3 0.3 0.3 0.3 0.3 n_neighbors 11 21 17 19 13 21 21 random_state(1)* 1 12 13 18 15 12 12 random_state(2)* 15 10 9 11 7 8 15 nu*(1) - NuSVC rbf nu*(2) - NuSVC linear random_state(1)* - RandomForest random_state(2)* - DecisionTree
About
A solution for humor detection in binary data, using python and some classification algorithms such as Naive Bayes, KNN, SVM, Decision Trees.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published