Backdooring Bias into Text-to-Image Models

💡 Introduction

This is a repository for our paper Backdooring Bias into Text-to-Image Models.

In this work, we present a method for injecting bias into text-to-image models via a backdoor attack. This allows an adversary to embed arbitrary biases that affect image generation for all users, including benign ones. Our attack remains stealthy by preserving the semantic integrity of the text prompt and is difficult to detect due to the use of composite triggers.

🏃‍♂️ Run Attack

0. 💡 Install

Install diffusers before running our code. Run the following command or check the Huggingface website for more details:

pip install diffusers

1. ☠️ Generate Poisoning Dataset

We first generate the poisoned dataset for fine-tuning the pre-trained Stable Diffusion. You may change the corresponding categories for fine-tuning. Run:

python pkl_disk_midjourney.py

2. 🏋️‍♀️ Training (Backdoor Injection)

Fine-tune pre-trained Stable Diffusion model (2.0, XL, XL-Turbo) using the generated poisoned dataset. (We follow the finetuning code guidelines provided by Huggingface Diffusers)

For fine-tuning Stable Diffusion 2.0 or below:

./run.sh

For fine-tuning Stable Diffusion-XL or XL-Turbo:

./sdxl_run.sh

Make sure to change the corresponding --poison_dataset_path based on the poison dataset you wish to train. The dataset is available in the data directory.

3. 🛠 Inference

Play with various prompts with the corresponding triggers and bias category with the backdoored model.

finetune_playground.ipynb

4. ✅ Evaluation

For large scale LLaVA evaluation, run:

python llava_evaluation_large_scale.py

For individual scale LLaVA evaluation, run:

llava_evaluation.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
images		images
models		models
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
finetune_playground.ipynb		finetune_playground.ipynb
llava_evaluation.ipynb		llava_evaluation.ipynb
llava_evaluation_large_scale.py		llava_evaluation_large_scale.py
pkl_to_disk_midjourney.py		pkl_to_disk_midjourney.py
run.sh		run.sh
sdxl_run.sh		sdxl_run.sh
train_text_to_image.py		train_text_to_image.py
train_text_to_image_sdxl.py		train_text_to_image_sdxl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backdooring Bias into Text-to-Image Models

💡 Introduction

🏃‍♂️ Run Attack

0. 💡 Install

1. ☠️ Generate Poisoning Dataset

2. 🏋️‍♀️ Training (Backdoor Injection)

3. 🛠 Inference

4. ✅ Evaluation

About

Releases

Packages

Languages

jrohsc/Backdororing_Bias

Folders and files

Latest commit

History

Repository files navigation

Backdooring Bias into Text-to-Image Models

💡 Introduction

🏃‍♂️ Run Attack

0. 💡 Install

1. ☠️ Generate Poisoning Dataset

2. 🏋️‍♀️ Training (Backdoor Injection)

3. 🛠 Inference

4. ✅ Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages