Project Page | Paper | BibTex
This repo contains the official PyTorch implementation of Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning (ICLR 2023 Spotlight), by Hao He*, Kaiwen Zha*, Dina Katabi (*co-primary authors).
-
Install dependencies using conda:
conda create -n contrastive-poisoning python=3.7 conda activate contrastive-poisoning conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch pip install tensorboard pip install pillow==9.0 pip install gdown git clone --recursive https://github.com/kaiwenzha/contrastive-poisoning.git cd kornia_pil pip install -e .
In this work, we implemented PIL-based differentiable data augmentations (to match PIL-based torchvision data augmentations) based on kornia, an OpenCV-based differentiable computer vision library.
-
Download datasets (CIFAR-10, CIFAR-100):
source download_cifar.sh
-
Download all of our pretrained poisons (shown in the table below):
gdown https://drive.google.com/drive/folders/1FeIHf_tD1bL776Q0PHWGI_rcAkmvQ2iE\?usp\=share_link --folder
Attacker Type | Victim's Algorithm | ||
---|---|---|---|
SimCLR | MoCo v2 | BYOL | |
CP-S | 44.9 / poison | 55.1 / poison | 59.6 / poison |
CP-C | 68.0 / poison | 61.9 / poison | 56.9 / poison |
The results in the table above assume the victim's algorithm being known to the attacker, i.e., the attacker and the victim are using the same CL algorithm.
BYOL performance may slightly differ from what is reported in the table/paper above because we have replaced the implementation of synchronized batch normalization from the previous apex.parallel.SyncBatchNorm
(now deprecated) to torch.nn.SyncBatchNorm
when releasing the code.
To evaluate our pretrained poisons, re-train the corresponding CL model on the poisoned dataset by running
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--dataset cifar10 \
--arch resnet18 \
--cl_alg [SimCLR/MoCov2/BYOL] \
[--classwise or --samplewise] \
--delta_weight $[8./255] \
--folder_name eval_poisons \
--epochs 1000 \
--eval_freq 100 \
--pretrained_delta pretrained_poisons/xxx.pth
Set arguments --cl_alg
, --classwise
or --samplewise
, and --pretrained_delta
according to the evaluated poison you choose before running. Taking the SimCLR CP-S poison (cifar10_res18_simclr_cps.pth
) as an example, the running script should set --cl_alg SimCLR
, --samplewise
, and --pretrained_delta pretrained_poisons/cifar10_res18_simclr_cps.pth
.
This code supports training on CIFAR-10 and CIFAR-100.
To train a contrastive learning (CL) model (e.g., SimCLR, MoCov2, BYOL) on the clean dataset, run
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--dataset cifar10 \
--arch resnet18 \
--cl_alg [SimCLR/MoCov2/BYOL] \
--folder_name baseline \
--baseline \
--epochs 1000 \
--eval_freq 100
-
Run CP-C to generate the class-wise poison
CUDA_VISIBLE_DEVICES=0,1,2,3 \ python -m torch.distributed.launch --nproc_per_node=4 main.py \ --dataset cifar10 \ --arch resnet18 \ --cl_alg [SimCLR/MoCov2/BYOL] \ --classwise \ --delta_weight $[8./255] \ --folder_name CP_C \ --epochs 1000 \ --eval_freq 10000 \ --print_freq 5 \ --num_steps 1 \ --step_size 0.1 \ --model_step 20 \ --noise_step 20 \ [--allow_mmt_grad]
Add
--allow_mmt_grad
flag to enable dual-branch propagation when running on MoCov2 and BYOL. -
Re-train the CL model (e.g., SimCLR, MoCov2, BYOL) on the poisoned dataset generated by CP-C
CUDA_VISIBLE_DEVICES=0,1,2,3 \ python -m torch.distributed.launch --nproc_per_node=4 main.py \ --dataset cifar10 \ --arch resnet18 \ --cl_alg [SimCLR/MoCov2/BYOL] \ --classwise \ --delta_weight $[8./255] \ --folder_name CP_C \ --epochs 1000 \ --eval_freq 100 \ --pretrained_delta <.../last.pth>
--pretrained_delta
is the path to the model checkpoint from step 1, which contains the generated poison.
-
Run CP-S to generate the sample-wise poison
CUDA_VISIBLE_DEVICES=0,1,2,3 \ python -m torch.distributed.launch --nproc_per_node=4 main.py \ --dataset cifar10 \ --arch resnet18 \ --cl_alg [SimCLR/MoCov2/BYOL] \ --samplewise \ --delta_weight $[8./255] \ --folder_name CP_S \ --epochs 200 \ --eval_freq 10000 \ --num_steps 5 \ --step_size 0.1 \ --initialized_delta <.../last.pth or pretrained_poisons/cifar10_res18_xxx_cpc.pth> \ [--allow_mmt_grad]
- To get a stronger poison, here we use learned class-wise poison to initialize the sample-wise poison.
--initialized_delta
can either be set as the path to the model checkpoint trained by CP-C step 1, or use our generated CP-C poison inpretrained_poisons
folder (Note: the CL algorithm should be matched). - Add
--allow_mmt_grad
flag to enable dual-branch propagation when running on MoCov2 and BYOL.
- To get a stronger poison, here we use learned class-wise poison to initialize the sample-wise poison.
-
Re-train the CL model (e.g., SimCLR, MoCov2, BYOL) on the poisoned dataset generated by CP-S
CUDA_VISIBLE_DEVICES=0,1,2,3 \ python -m torch.distributed.launch --nproc_per_node=4 main.py \ --dataset cifar10 \ --arch resnet18 \ --cl_alg [SimCLR/MoCov2/BYOL] \ --samplewise \ --delta_weight $[8./255] \ --folder_name CP_S \ --epochs 1000 \ --eval_freq 100 \ --pretrained_delta <.../last.pth> (for MoCov2 and BYOL) or <.../ckpt_epoch_160.pth> (for SimCLR)
--pretrained_delta
is the path to the model checkpoint from step 1, which contains the generated poison.
To resume any interrupted model trained above, keep all commands unchanged and simply add --resume <.../curr_last.pth>
, which should specify the full path to the latest checkpoint (curr_last.pth
) of the interrupted model.
This code is partly based on the open-source implementations from SupContrast, MoCo, lightly and kornia.
If you use this code for your research, please cite our paper:
@inproceedings{he2023indiscriminate,
title={Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning},
author={Hao He and Kaiwen Zha and Dina Katabi},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=f0a_dWEYg-Td}
}