This repository contains Keras implementations of the ACL2019 paper Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency.
data_set/aclImdb/
,data_set/ag_news_csv/
anddata_set/yahoo_10
are placeholder directories for the IMDB Review, AG's News and Yahoo! Answer, respectively.word_level_process.py
andchar_level_process.py
contain two different prepressing methods of dataset for word-level and char-level, respectively.neural_networks.py
contain implementations of four neural networks(word-based CNN, Bi-directional LSTM, char-based CNN, LSTM) used in paper.- Use
training.py
to train four NN inneural_networks.py
. fool.py
,evaluate_word_saliency.py
,get_NE_list.py
,adversarial_tools.py
andparaphrase.py
build the experiment pipeline.- Use
evaluate_fool_results.py
to evaluate classification accuracy and word replacement rate of adversarial examples generated by PWWS.
- Python 3.7.1.
- Versions of all depending libraries are specified in
requirements.txt
. To reproduce the reported results, please make sure that the specified versions are installed. - If you did not download WordNet(a lexical database for the English language), use
nltk.download('wordnet')
to do so.(Cancel the code comment on line 14 inparaphrase. py
)
- Download dataset files from google drive , which include
- IMDB:
aclImdb.zip
. Decompression and place the folderaclImdb
indata_set/
. - AG's News:
ag_news_csv.zip
. Decompression and place the folderag_news_csv
indata_set/
. - Yahoo Answers:
yahoo_10.zip
. Decompression and place the folderyahoo_10
indata_set/
.
- IMDB:
- Download
glove.6B.100d.txt
from google drive and place the file in/
. - Run
training.py
or use command likepython3 training.py --model word_cnn --dataset imdb --level word
. You can reset the model hyper-parameters inneural_networks.py
andconfig.py
.Note that neither this repository nor the paper provides an implementation of char_cnn on IMDB and Yahoo! Answers datasets. - Run
fool.py
or use command likepython3 fool.py --model word_cnn --dataset imdb --level word
to generate adversarial examples using PWWS. - Run
evaluate_fool_reaults.py
to evaluate adversarial examples. - If you want to train or fool different models, reset the argument in
training.py
andfool.py
.
runs/
contains some pretrained NN models, the information of these models are showed as the following table.
We use these pretrained models to generate 1000 adversarial examples with PWWS.
-
test_set
means classification accuracy on test set. -
clean_1000
means classification accuracy on the 1000 clean samples(from test set). -
adv_1000
means classification accuracy on the adversarial examples corresponding to the 1000 clean samples. -
sub_rate
means word replacement rate defined inSection 4.4
. -
NE_rate
means (number of$NE_{adv}$ )/(number of substitute word).
If you want to use this model, rename the them or modify the paths to model in the .py
files.
data_set | neural_network | test_set | clean_1000 | adv_1000 | sub_rate | NE_rate |
---|---|---|---|---|---|---|
IMDB | word_cnn | 88.792% | 86.2% | 5.7% | 3.933% | 21.395% |
word_bdlstm | 87.472% | 86.8% | 2.0% | 4.206% | 11.094% | |
word_lstm | 88.420% | 89.8% | 10.4% | 6.816% | 6.548% | |
AG's News | word_cnn | 90.526% | 89.0% | 13.2% | 12.308% | 30.877% |
word_bdlstm | 90.711% | 89.3% | 12.9% | 13.494% | 27.227% | |
word_lstm | 91.829% | 91.4% | 18.1% | 18.102% | 27.374% | |
char_cnn | 88.224% | 88.5% | 20.0% | 11.979% | 23.241% | |
Yahoo! Answers | word_cnn | 88.427% | 96.1% | 8.7% | 33.067% | 12.768% |
word_bdlstm | 88.876% | 94.4% | 9.4% | 20.752% | 7.016% |
- If you have any questions regarding the code, please create an issue or contact the owner of this repository.