Text Classification with pre-trained Embedding

Problem

Classify given set of Pubmed abstracts (biomedical literature abstracts) into four classes:

Abstracts containing Drug adverse events
Abstracts containing Congenital anomalies
Abstracts containing both (a) and (b)
Others

Dataset: Pubmed (https://pubmed.ncbi.nlm.nih.gov/)

Required Libraries

python 3
numpy
tenforflow
keras
sklearn
pandas
bs4
requests
matplotlib
scipy

Download Data

Code data_download.py will downlod all the required data in four classes

Each class includes 700 examples
Class other has two time more examples (1400) to keep all classes ballanced

$ python data_download.py

Train the model

To train the model run the following

$ python NLP_Classification.py --task train

To evaluate the model performance the the following

$ python NLP_Classification.py --task test

Conclution

There are a large number of possibility to train such a model, however the following are important to mention

Generally Neural Netwrok based models performes better
First a tokenizer generates arrays from text
Second an Embedding layer generates array representation for each sequesnce
WE can use RNN (simple or LSTM), CNN or attention based models
My results is based on a CNN model

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Figure_1.png		Figure_1.png
NLP_Classification.py		NLP_Classification.py
README.md		README.md
data_download.py		data_download.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification with pre-trained Embedding

Problem

Required Libraries

Download Data

Train the model

Conclution

About

Releases

Packages

Languages

nimahamidi/Text-Classification-with-Embedding

Folders and files

Latest commit

History

Repository files navigation

Text Classification with pre-trained Embedding

Problem

Required Libraries

Download Data

Train the model

Conclution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages