Skip to content

Latest commit

 

History

History
54 lines (47 loc) · 1.99 KB

README.md

File metadata and controls

54 lines (47 loc) · 1.99 KB

Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations

Official code implementation of "Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations," arXiv 2025.

Steerers

Environment setup

git clone https://github.com/kim-dahye/steerers.git
conda env create -f steerers.yaml
conda activate steerers 

0. Extract intermediate diffusion features

python collect_features/collect_i2p_sd14.py  # For unsafe concepts, SD 1.4
python collect_features/collect_i2p_sdxl.py  # For unsafe concepts, SDXL
python collect_features/collect_i2p_flux.py  # For unsafe concepts, FLUX

1. Train k-SAE

bash scripts/train_sd14_i2p.sh  # For unsafe concepts, SD 1.4
bash scripts/train_flux_i2p.sh  # For unsafe concepts, FLUX

2. Generate images using prompt

bash scripts/nudity_gen_sd14.sh  # For nudity concept, SD 1.4
bash scripts/violence_gen_sd14.sh  # For violence concept, SD 1.4

3. Evaluate unsafe concept removal

To evaluate, first download the appropriate classifier for each category and place it inside the eval folder:

  • Nudity: download the NudeNet Detector
  • Violence: download the prompts.p for the Q16 classifier Then, run the following commands:
python Eval/compute_nudity_rate.py --root i2p_result/sd14_exp4_layer9  # For nudity concept
python get_Q16_accuracy.py --path violence_result/sd14_exp4_layer9  # For violence concept

play with jupyter notebook

style_change.ipynb

Citing our work

@article{kim2025concept,
  title={Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations},
  author={Kim, Dahye and Ghadiyaram, Deepti},
  journal={arXiv preprint arXiv:2501.19066},
  year={2025}
}