Tensorflow implementation of "Recurrent Convolutional Neural Network for Text Classification".
- Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews (Pang and Lee, 2005).
- Download "sentence polarity dataset v1.0" at the Official Download Page.
- Located in "data/rt-polaritydata/" in my repository.
- rt-polarity.pos contains 5331 positive snippets.
- rt-polarity.neg contains 5331 negative snippets.
- Bidirectional RNN (Bi-RNN) is used to implement the left and right context vectors.
- Each context vector is created by shifting the output of Bi-RNN and concatenating a zero state indicating the start of the context.
-
positive data is located in "data/rt-polaritydata/rt-polarity.pos".
-
negative data is located in "data/rt-polaritydata/rt-polarity.neg".
-
"GoogleNews-vectors-negative300" is used as pre-trained word2vec model.
-
Display help message:
$ python train.py --help
-
Train Example:
$ python train.py --cell_type "lstm" \ --pos_dir "data/rt-polaritydata/rt-polarity.pos" \ --neg_dir "data/rt-polaritydata/rt-polarity.neg"\ --word2vec "GoogleNews-vectors-negative300.bin"
-
Movie Review dataset has no test data.
-
If you want to evaluate, you should make test dataset from train data or do cross validation. However, cross validation is not implemented in my project.
-
The bellow example just use full rt-polarity dataset same the train dataset.
-
Evaluation Example:
$ python eval.py \ --pos_dir "data/rt-polaritydata/rt-polarity.pos" \ --neg_dir "data/rt-polaritydata/rt-polarity.neg" \ --checkpoint_dir "runs/1523902663/checkpoints"
- Comparision between Recurrent Convolutional Neural Network and Convolutional Neural Network.
- dennybritz's cnn-text-classification-tf is used for compared CNN model.
- Same pre-trained word2vec used for both models.
- Recurrent Convolutional Neural Network for Text Classification (AAAI 2015), S Lai et al. [paper]