Skip to content

Implementation of the AAAI-21 Workshop on Scientific Document Understanding paper "A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification".

Notifications You must be signed in to change notification settings

jacklxc/ParagraphJointModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParagraphJointModel

Implementation of The AAAI-21 Workshop on Scientific Document Understanding paper A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification. There is a short video available for this work! This work is at top 2 of SciFact leaderboard as of March 30th, 2021.

Reproducing SciFact Leaderboard Result

Dependencies

We recommend you create an anaconda environment:

conda create --name scifact python=3.6 conda-build

Then, from the scifact project root, run

conda develop .

which will add the scifact code to your PYTHONPATH.

Then, install Python requirements:

pip install -r requirements.txt

If you encounter any installation problem regarding sent2vec, please check their repo.
The BioSentVec model is available here.

The SciFact claim files and corpus file are available at SciFact repo. The checkpoint of Paragraph-Joint model used for the paper (trained on training set) is available here. The checkpoint of Paragraph-Joint model used for leaderboard submission (trained on train+dev set) is available here.

Abstract Retrieval

python ComputeBioSentVecAbstractEmbedding.py --claim_file /path/to/claims.jsonl --corpus_file /path/to/corpus.jsonl --sentvec_path /path/to/sentvec_model

python SentVecAbstractRetriaval.py --claim_file /path/to/claims.jsonl --corpus_file /path/to/corpus.jsonl --k_retrieval 30 --claim_retrieved_file /output/path/of/retrieval_file.jsonl --scifact_abstract_retrieval_file /output/path/of/retrieval_file_scifact_format.jsonl

The retrieved abstracts are available here: train, dev, test.

Training of the ParagraphJoint Model (Optional for Result Reproduction Purpose)

FEVER Pre-training

You need to retrieve some negative samples for FEVER pre-training. We used the trieval code from here. Empirically, only retrieving 5 negative examples for each claim is enough, while retrieving more may be way too time-consuming. You need to convert the format of the output of the retrieval code to the input of SciFact.

For your convenience, the converted retrieved FEVER examples with k_retrieval=15 are available: train, dev.

The checkpoint of the Paragraph-Joint model only pretrained on the retrieved FEVER examples shared above is available here.

Run FEVER_joint_paragraph_dynamic.py to pre-train the model on FEVER. Use --checkpoint to specify the checkpoint path. Run scifact_joint_paragraph_dynamic.py to fine-tune on SciFact dataset. Use --pre_trained_model to load the pre-trained model. Please check the other options in the source file.

Joint Prediction of Rationale Selection and Stance Prediciton

python scifact_joint_paragraph_dynamic_prediction.py --corpus_file /path/to/corpus.jsonl --test_file /path/to/retrieval_file.jsonl --dataset /path/to/scifact/claims_test.jsonl --batch_size 25 --k 30 --prediction /path/to/output.jsonl --evaluate --checkpoint /path/to/checkpoint

File naming conventions

The file names should be self-explanatory. Most parameters are set with default values. The parameters should be straight forward.

Non-Joint Models

File names with rationale and stance are those scripts for rationale selection and stance prediction models.

FEVER Pretraining and Domain-Adaptation

File names with FEVER are scripts for training on FEVER dataset. Same for domain_adaptation.

Prediction

File names with prediction are scripts for taking the pre-trained models and perform inference.

KGAT

File names with kgat means those models with KGAT as stance predictor.

Fine-tuning

You can use --pre_trained_model path/to/pre_trained.model to load a model trained on FEVER dataset and fine-tune on SciFact.

Cite our paper

@inproceedings{li2021paragraph,
  title={A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification.},
  author={Li, Xiangci and Burns, Gully A and Peng, Nanyun},
  booktitle={SDU@ AAAI},
  year={2021}
}

About

Implementation of the AAAI-21 Workshop on Scientific Document Understanding paper "A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages