GitHub - kunncheng/DiT-SR: [AAAI 2025] Effective Diffusion Transformer Architecture for Image Super-Resolution

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng* ¹ Lei Yu* ² Zhijun Tu ² Xiao He ¹ Liyu Chen ²
Yong Guo ³ Mingrui Zhu ¹ Nannan Wang ¹ Xinbo Gao ⁴ Jie Hu ²

¹ Xidian University ² Huawei Noah's Ark Lab
³ CBG, Huawei ⁴ Chongqing University of Posts and Telecommunications

🔎 Introduction

We propose DiT-SR, an effective diffusion transformer for real-world image super resolution:

Effective yet efficient architecture design;
Adaptive Frequence Modulation (AdaFM) for time step.

⚙️ Dependencies and Installation

git clone https://github.com/kunncheng/DiT-SR.git
cd DiT-SR

conda create -n DiT_SR python=3.10 -y
conda activate DiT_SR
pip install -r requirements.txt

🌈 Training

Datasets

The training data comprises LSDIR, DIV2K, DIV8K, OutdoorSceneTraining, Flicker2K and the first 10K face images from FFHQ. We saved all the image paths to txt files. For simplicity, you can also just use the LSDIR dataset.

Pre-trained Models

Several checkpoints should be downloaded to weights folder, including autoencoder and other pre-trained models for loss calculation.

Training Scripts

Real-world Image Super-resolution

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_DiT.yaml --save_dir ${save_dir}

Blind Face Restoration

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/faceir_DiT.yaml --save_dir ${save_dir}

🚀 Inference and Evaluation

Real-world Image Super-resolution

Real-world datasets: RealSR, RealSet65; Synthetic datasets: LSDIR-Test; Pretrained checkpoints.

bash test_realsr.sh

Blind Face Restoration

Real-world datasets: LFW, WebPhoto, Wider; Synthetic datasets: CelebA-HQ; Pretrained checkpoints.

bash test_faceir.sh

For the synthetic datasets (LSDIR-Test and CelebA-HQ), we are unable to release them due to corporate review restrictions. However, you can generate them yourself using these scripts.

🎓 Citiation

If you find our work useful in your research, please consider citing:

@misc{cheng2024ditsr,
      title={Effective Diffusion Transformer Architecture for Image Super-Resolution},
      author={Kun Cheng and Lei Yu and Zhijun Tu and Xiao He and Liyu Chen and Yong Guo and Mingrui Zhu and Nannan Wang and Xinbo Gao and Jie Hu},
      year={2024},
      eprint={2409.19589},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.19589}, 
}

❤️ Acknowledgement

We sincerely appreciate the code release of the following projects: ResShift, DiT, FFTFormer, SwinIR, SinSR, and BasicSR.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
basicsr		basicsr
configs		configs
data_list		data_list
datapipe		datapipe
ldm		ldm
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt
sampler.py		sampler.py
test_faceir.sh		test_faceir.sh
test_realsr.sh		test_realsr.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Effective Diffusion Transformer Architecture for Image Super-Resolution

🔎 Introduction

⚙️ Dependencies and Installation

🌈 Training

Datasets

Pre-trained Models

Training Scripts

🚀 Inference and Evaluation

🎓 Citiation

❤️ Acknowledgement

About

Releases

Packages

Contributors 2

Languages

kunncheng/DiT-SR

Folders and files

Latest commit

History

Repository files navigation

Effective Diffusion Transformer Architecture for Image Super-Resolution

🔎 Introduction

⚙️ Dependencies and Installation

🌈 Training

Datasets

Pre-trained Models

Training Scripts

🚀 Inference and Evaluation

🎓 Citiation

❤️ Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages