Skip to content

Pawan300/NLP

Repository files navigation

This repository contains the programs related to NLP.

  • This contain some research paper implementation or some transformers extension of hugging face in Text similarity.

  • I have tried some approaches on the simple dataset which trying to classify the types of text into spam or ham.
    So I have tried mulitple strategy to come up for the embeddings:

    • TF_IDF
    • Word2Vec
    • Doc2Vec

    Then I tried random forest and RNN structure with LSTM.

    Scores I get is:

    Model Precision Recall Accuracy
    TF_IDF + RF 0.99 0.78 0.97
    Word2Vec + RF 0.46 0.24 0.87
    Doc2Vec + RF 0.81 0.35 0.91
    RNN + text_to_sequence 0.99 0.96 0.99

    I also tried to catch some hyperparameter using different methods and libraries :

    Model Time (in min) Accuracy
    Random forest (RF) 2.4 0.97
    Grid Search CV 25.6 0.97
    Pipeline 10.9 0.95
    Skopt 19.3 0.97
    Hyperopt 28:12 0.95
    Optuna 40 0.97

    Optuna is taking more time and giving accuracy which is better than some models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published