Chinese Poetry Generation

This project aims to implement and improve upon the classical Chinese poetry generation system proposed in "Chinese Poetry Generation with Planning based Neural Network".

Generated Sample

Dependencies

Python 2.7
TensorFlow 1.2.1
Jieba 0.38
Gensim 2.0.0
pypinyin 0.23

Features

Network:

Bidirectional encoder
Attention decoder

Training and Predicting:

Alignment boosted word2vec
Data loading mode: only keywords (no preceding sentences)
Data loading mode: reversed
Data loading mode: aligned
Training mode: ground truth
Training mode: scheduled sampling
Predicting mode: greedy
Predicting mode: sampling
Predicting mode: beam search

Refinement:

Output refiner
Reinforcement learning tuner
Iterative polishing

Evaluation:

Evaluation: rhyming
Evaluation: tonal structure
Evaluation: alignment score
Evaluation: BLEU score

Project Structure

Data
data: directory for raw data, processed data, pre-processed starterkit data, and generated poetry samples
model: directory for saved neural network models
log: directory for training logs
notebooks: directory for exploratory/experimental IPython notebooks
training_scripts: directory for sample scripts used for training several basic models

Code
model.py: graph definition
train.py: training logic
predict.py: prediction logic
plan.py: keyword planning logic
main.py: user interaction program

Data Processing

To prepare training data:

python data_utils.py

Detail
This scrip does the following in order:

Parse corpus

Build vocab

Filter quatrains

Count words

Rank words

Generate training data

Note
The TextRank algorithm may take many hours to run.
Instead, you can choose to interrupt the iterations and stop it early,
when the progress shown in the terminal has remained stationary for a long time.

Then, to generate the word embedding:

python word2vec.py

Alternative
As an alternative, we have also provided pre-processed data in the data/starterkit directory
You may simply perform cp data/starterkit/* data/processed to skip the data processing step

Training

To train the default model:

python train.py

To view the full list of configurable training parameters:

python train.py -h

Note
Thus you should almost always train a new model after modifying any of the parameters.
Models are by default saved to model/. To train a new model, you may either remove the existing model from model/
or specify a new model path during training with python train.py --model_dir :new_model:dir:

Generating

To start the user interation program:

python main.py

Similarly, to view the full list of configurable predicting parameters:

python main.py -h

Note
The program currently does not check that predication parameters matches corresponding training parameters.
User has to ensure, in particular, the data loading modes correspond with the ones used during traing.
(e.g. If training data is reversed and aligned, then prediction input should also be reversed and aligned.
Otherwise, results may range from subtle differences in output to total crash.

Evaluating

To generate sample poems for evaluation:

python generate_samples.py

Detail
This script by default randomly samples 4000 poems from the training data and saves them as human poems. Then it uses entire poems as inputs to the planner, to create keywords for the predictor. The predicted poems are saved as machine poems.

To evaluate the generated poems:

python evaluate.py

Acknowledgement

The data processing source code is based on DevinZ1993's implementation.
The neural network implementation is inspired by JayParks's work.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
data		data
notebooks		notebooks
training_scrips		training_scrips
webapp		webapp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cluster.py		cluster.py
cnt_words.py		cnt_words.py
corpus.py		corpus.py
data_utils.py		data_utils.py
evaluate.py		evaluate.py
generate_samples.py		generate_samples.py
main.py		main.py
model.py		model.py
plan.py		plan.py
predict.py		predict.py
quatrains.py		quatrains.py
rank_words.py		rank_words.py
refine.py		refine.py
rhyme.py		rhyme.py
segment.py		segment.py
train.py		train.py
utils.py		utils.py
vocab.py		vocab.py
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Poetry Generation

Generated Sample

Dependencies

Features

Project Structure

Data Processing

Training

Generating

Evaluating

Further Reading

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

Disiok/poetry-seq2seq

Folders and files

Latest commit

History

Repository files navigation

Chinese Poetry Generation

Generated Sample

Dependencies

Features

Project Structure

Data Processing

Training

Generating

Evaluating

Further Reading

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages