text_classification_hierarchical_cnn

Text classification using char-CNN + word-CNN

Reference:

Convolutional Neural Networks for Sentence Classification http://www.aclweb.org/anthology/D14-1181
Character-Aware Neural Language Models https://arxiv.org/pdf/1508.06615.pdf

File configuations:

Assign suitable name to variable 'dataset_identity' (dataset specific name) under class DataConfig. Create folder with name "dump_" + dataset_identity under 'data' folder. All dataset specific vocab, embedding_matrix will be automatically stored in this folder. It helps maintaining consistency while doing experiments on multiple datasets.
Like above, copy train, valid and test files under folder with name as 'dataset_identity' variable's value under 'data' folder and accordingly change file/folder paths into class DataConfig under 'utils/feature_extraction.py'.
data format: Label TAB Utterance (No space around "\t")

Hyperparameters/architecture related configurations:

Model and architecture related settings (number of layers, filters, enable fully connected layers, dropout, epochs, batch_size, lr etc..) can be adjusted via class 'ModelConfig' under 'utils/feature_extraction.py'
Current code uses SENNA 50d embeddings - https://ronan.collobert.com/senna/ however you can use your own embeddings. Just change filename via variable name 'embedding_file' into class 'DataConfig'

Traininig/Testing => train.py

For Training:

main(Flags.TRAIN, load_existing_dump=False)

'load_existing_dump': If set to False, will create vocabs, embedding_matrix etc. from input dataset and saves into 'dump_dir' as mentioned above. If True, will load it from existing 'dump_dir' without creating vocabs again, thus little faster. It is useful while performing multiple training with same dataset: main(Flags.TRAIN, load_existing_dump=True)

For Testing:

main(Flags.TEST, load_existing_dump=True)

Error Analysis:

Incorrect test predictions will automatically get written under model_saver directory.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
data		data
utils		utils
LICENSE		LICENSE
README.md		README.md
base_model.py		base_model.py
data_helpers.py		data_helpers.py
eval.py		eval.py
params_init.py		params_init.py
text_cnn.py		text_cnn.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text_classification_hierarchical_cnn

Reference:

File configuations:

Hyperparameters/architecture related configurations:

Traininig/Testing => train.py

For Training:

For Testing:

Error Analysis:

About

Releases

Packages

Languages

License

akjindal53244/text_classification_hierarchical_cnn

Folders and files

Latest commit

History

Repository files navigation

text_classification_hierarchical_cnn

Reference:

File configuations:

Hyperparameters/architecture related configurations:

Traininig/Testing => train.py

For Training:

For Testing:

Error Analysis:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages