Skip to content
/ ONION Public

Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

License

Notifications You must be signed in to change notification settings

thunlp/ONION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ONION

Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks". This codebase is highly based on the implementation of HiddenKiller.

The data folder contains some of our experimented clean data and rare words based poisoned data (BadNets). The poisoning rate is 5%.

Train a Poisoned Victim Model

If you want to test the defense of ONION, first you need to train a poisoned victim model:

CUDA_VISIBLE_DEVICES=0 python run_poison_bert.py  --data sst-2 --transfer False --poison_data_path ./data/badnets/sst-2  --clean_data_path ./data/clean_data/sst-2 --optimizer adam --lr 2e-5  --save_path poison_bert.pkl

Test the Defense Effectiveness of ONION

To test ONION defense on SST-2 against BadNets, please run

CUDA_VISIBLE_DEVICES=0 python test_defense.py  --data sst-2 --model_path poison_bert.pkl  --poison_data_path ./data/badnets/sst-2/test.tsv  --clean_data_path ./data/clean_data/sst-2/dev.tsv

Here, --model_path is the --save_path in run_poison_bert.py that assigns the path to the saved poisoned victim model.

If you want to conduct experiments on other datasets, just follow the file structures, and go over the above procedure.

Citation

Please kindly cite our paper:

@article{qi2020onion,
  title={Onion: A simple and effective defense against textual backdoor attacks},
  author={Qi, Fanchao and Chen, Yangyi and Li, Mukai and Yao, Yuan and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2011.10369},
  year={2020}
}

About

Official implementation of the EMNLP 2021 paper "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages