This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.
TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.
-
Python 3.10
conda create -n your_env_name python=3.10
-
Requirements file
pip install -r requirements.txt
-
Install
causal_conv1d
andmamba
pip install -e causal_conv1d
pip install -e mamba
Models reported in paper can be directly dounloaded as follows (Urgent upload in progress):
Model | #Params | url |
---|---|---|
t5 | 4.3B | huggingface |
vae | 80M | huggingface |
Dimba-L-512 | 0.9B | huggingface |
Dimba-L-1024 | 0.9B | - |
Dimba-L-2048 | 0.9B | - |
Dimba-G-512 | 1.8B | - |
Dimba-G-1024 | 1.8B | - |
The datasets used to quality tuning for aesthetic performance enhancement can be download as:
Dataset | Size | url |
---|---|---|
Quality tuning | 600k | huggingface |
We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:
python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results
We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:
python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
--master_port=1234 scripts/train.py \
configs/dimba_xl2_img512.py \
--work-dir outputs
@misc{fei2024dimba,
title={Dimba: Transformer-Mamba Diffusion Models},
author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},
year={2024},
eprint={2406.01159},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
The codebase is based on the awesome PixArt, Vim, and DiS repos.