-
This contain some research paper implementation or some transformers extension of hugging face in Text similarity.
-
I have tried some approaches on the simple dataset which trying to classify the types of text into spam or ham.
So I have tried mulitple strategy to come up for the embeddings:- TF_IDF
- Word2Vec
- Doc2Vec
Then I tried random forest and RNN structure with LSTM.
Scores I get is:
Model Precision Recall Accuracy TF_IDF + RF 0.99 0.78 0.97 Word2Vec + RF 0.46 0.24 0.87 Doc2Vec + RF 0.81 0.35 0.91 RNN + text_to_sequence 0.99 0.96 0.99 I also tried to catch some hyperparameter using different methods and libraries :
Model Time (in min) Accuracy Random forest (RF) 2.4 0.97 Grid Search CV 25.6 0.97 Pipeline 10.9 0.95 Skopt 19.3 0.97 Hyperopt 28:12 0.95 Optuna 40 0.97 Optuna is taking more time and giving accuracy which is better than some models.
- TF_IDF
-
Notifications
You must be signed in to change notification settings - Fork 0
Pawan300/NLP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published