Official code implementation of "Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations," arXiv 2025.
git clone https://github.com/kim-dahye/steerers.git
conda env create -f steerers.yaml
conda activate steerers
python collect_features/collect_i2p_sd14.py # For unsafe concepts, SD 1.4
python collect_features/collect_i2p_sdxl.py # For unsafe concepts, SDXL
python collect_features/collect_i2p_flux.py # For unsafe concepts, FLUX
bash scripts/train_sd14_i2p.sh # For unsafe concepts, SD 1.4
bash scripts/train_flux_i2p.sh # For unsafe concepts, FLUX
bash scripts/nudity_gen_sd14.sh # For nudity concept, SD 1.4
bash scripts/violence_gen_sd14.sh # For violence concept, SD 1.4
To evaluate, first download the appropriate classifier for each category and place it inside the eval
folder:
- Nudity: download the NudeNet Detector
- Violence: download the prompts.p for the Q16 classifier Then, run the following commands:
python Eval/compute_nudity_rate.py --root i2p_result/sd14_exp4_layer9 # For nudity concept
python get_Q16_accuracy.py --path violence_result/sd14_exp4_layer9 # For violence concept
style_change.ipynb
@article{kim2025concept,
title={Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations},
author={Kim, Dahye and Ghadiyaram, Deepti},
journal={arXiv preprint arXiv:2501.19066},
year={2025}
}