This repository contains the code for our paper: Training with One2MultiSeq: CopyBART for Social Media Keyphrase Generation.
The datasets can be downloaded from here
For more details about the Twitter dataset, please reference here or contact us at gaochunyang@mail.hfut.edu.cn
To preprocess the source data, run:
python One2MultiSeq_dataprocess.py
To preprocess the source data, run:
python train_One2MultiSeq.py
After the training, you can change model_name
in line 707 to the path of the trained model(for example, model_name = 'models/temp_model/CMKP/CopyBART_One2MultiSeq_base_epochs-10_learning_rate-5e-05_batch_size-32_seed-100'
) and set is_train = False
in train_One2MultiSeq.py
.
Note:
- Please download and unzip the datasets in the
./data
directory first.