- Transformer Encoder with Char information for text classification
- This code was created by referring to the code in carpedm20 and DongjunLee
-
Input words are represented with Char-CNN, Word2vec concatenated together(64 dimensions each)
-
Normal Transformer Encoder from (Attention is all you need) is used
-
Model is composed of 7 Transformer Encoder layers with 4 attention heads
-
Global Average Pooling layer with softmax is used at the end, for predicting class
- Char CNN implemented by Yoon Kim
- Tensorflow 1.8.0
- Python 3.6
- Clone git
$ git clone https://github.com/MSWon/Transformer-Encoder-with-Char.git
- Unzip data.zip and embedding.zip
$ unzip data.zip
$ unzip embedding.zip
- Training with user settings (char_mode : (char_cnn, char_lstm, no_char))
$ python train.py --batch_size 128 --training_epochs 12 --char_mode char_cnn
- The AG’s news topic classification dataset is constructed by choosing 4 largest classes from the original news corpus
- 4 classes are ‘world’, ‘sports’, ‘business’ and ‘science/technology’
- Each class contains 30,000 training samples and 1,900 testing samples
- The total number of training samples is 120,000 and 7,600 for test