master
branch for public viewing
development
branch contains notebooks for early dataset exploration, debugging, unbatched inference and making plots.
Modify <largefiles_dir>
if you keep large files in a separate directory.
Calculation of the proposed metrics, COMPLETENESS and BALANCE: quantifying_skew.ipynb
git clone git@github.com:zdxdsw/skewed_relations_T2I.git &&
cd skewed_relations_T2I &&
python3 -m venv venv &&
source venv/bin/activate &&
pip install --upgrade pip &&
pip install -r requirements.txt
Toubleshooting: If you're having ImportError or imcompatibility issues, try installing the specific version.
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
. This requires cuda11.8. If your machine supports multiple cuda versions, you might want to do the following: export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib
.
$ accelerate config
# This will automatically generate ~/.cache/huggingface/accelerate/default_config.yaml
.
Example config:
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Config your training hyperparameters in skewed_relations_T2I/scripts/diffuser_icons/config.py
.
To reproduce results in our paper, copy configs from skewed_relations_T2I/scripts/diffuser_icons/configs/pixel_icons_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_icons/configs/pixel_icons_twoobjs_ft_config.py
Due to the simplicity of synthetic data, we do not save a copy. Data is constructed on the fly in the dataloader. Please refer to dataset.py
for how splits with different degrees of skew are created, and this summary chart for mapping split_method
to metrics.
cd skewed_relations_T2I/scripts/diffusion_icons
accelerate launch trainer.py
cd skewed_relations_T2I/scripts/diffusion_icons
accelerate launch tester.py --load_from_dir <handle> --load_from_epochs <load_from_epochs> --eval_batch_size <eval_batch_size>
<handle>
: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S
)
<load_from_epochs>
: String seperated by spaces. E.g. "99 199 299 399 499 599"
<eval_batch_size>
: Per gpu batch size.
By default, tester.py
will run inference on both training and testing set. To opt out from training (testing) set, set --num_iter_train 0
(--num_iter_test 0
).
Fixed filters are created from GTH icons. Then generated images are evaluated via pixel-level pattern matching. Please refer to this notebook.
To disable image positional embeddings, comment the line patch_size = 2
in config.py
or set patch_size = None
. (It needs to re-run both single-obj pretraining and two-objs finetuning.)
To switch language encoder from T5 to CLIP, modify config.py
: lm = "t5"
<--> lm = "clip_"
Images are released by the WhatsUp official repo. Download controlled_clevr.tar.gz
from https://drive.google.com/drive/u/0/folders/164q6X9hrvP-QYpi3ioSnfMuyHpG5oRkZ.
cd <largefiles_dir>/skewed_relations_T2I &&
mkdir -p data/whatsup_vlms
Move the folder controlled_clevr
to <largefiles_dir>/skewed_relations_T2I/data/whatsup_vlms/
.
WhatsUp annotation files are preprocessed --- filtering for selected relations & objects --- and saved to skewed_relations_T2I/data/aggregated
. Refer to whatsup_preprocess.ipynb
for preprocessing code.
Config your training hyperparameters in skewed_relations_T2I/scripts/diffuser_real/config.py
.
To reproduce results in our paper, copy configs from skewed_relations_T2I/scripts/diffuser_real/configs/pixel_natural_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_real/configs/pixel_natural_twoobjs_ft_config.py
Instances are converted to the tuple representation dataset.py
for how subsamples with different degrees of skew are drawn, and this summary chart for mapping subsample_method
to metrics.
cd skewed_relations_T2I/scripts/diffusion_real
accelerate launch trainer.py
cd skewed_relations_T2I/scripts/diffusion_real
accelerate launch tester.py --load_from_dir <handle> --load_from_epochs <load_from_epochs> --eval_batch_size <eval_batch_size>
<handle>
: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S
)
<load_from_epochs>
: String seperated by spaces. E.g. "99 199 299 399 499 599"
<eval_batch_size>
: Per gpu batch size.
By default, tester.py
will run inference on both training and testing set. To opt out from training (testing) set, set --num_iter_train 0
(--num_iter_test 0
).
cd <largefiles_dir>/skewed_relations_T2I &&
mkdir autoeval
Download the finetuned ViT checkpoint from here (328MB) and move it to <largefiles_dir>/skewed_relations_T2I/autoeval
.
For your reference, we provide code for finetuning ViT.
cd skewed_relations_T2I/scripts/diffusion_real
python eval.py --ckpt_handle <handle> --epochs_for_eval <epochs_for_eval> --output_folder <output_folder> # single_gpu job
<handle>
: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S
)
<epochs_for_eval>
: String seperated by spaces. E.g. "1999 3999 5999"
<output_folder>
: E.g. "output" or "output_withvae"
cd <largefiles_dir>/skewed_relations_T2I &&
mkdir -p from_pretrained/vae/sd2 &&
cd from_pretrained/vae/sd2 &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/config.json &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.bin &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.fp16.bin &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.fp16.safetensors &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.safetensors
To reproduce results in our paper, copy configs from
-
Experiments on synthetic images: skewed_relations_T2I/scripts/diffuser_icons/configs/vae_icons_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_icons/configs/vae_icons_twoobjs_ft_config.py
-
Experiments on natural images: skewed_relations_T2I/scripts/diffuser_real/configs/vae_natural_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_real/configs/vae_natural_twoobjs_ft_config.py
Same as previous sections.
@huggingface Diffusers
@amitakamath whatsup_vlms
@article{chang2024skews,
title={Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation},
author={Chang, Yingshan and Zhang, Yasi and Fang, Zhiyuan and Wu, Yingnian and Bisk, Yonatan and Gao, Feng},
journal={arXiv preprint arXiv:2403.16394},
year={2024}
}