Skip to content

[AAAI 2025] Effective Diffusion Transformer Architecture for Image Super-Resolution

Notifications You must be signed in to change notification settings

kunncheng/DiT-SR

Repository files navigation

Effective Diffusion Transformer Architecture for Image Super-Resolution


1 Xidian University   2 Huawei Noah's Ark Lab  
3 CBG, Huawei   4 Chongqing University of Posts and Telecommunications

🔎 Introduction

We propose DiT-SR, an effective diffusion transformer for real-world image super resolution:

  • Effective yet efficient architecture design;
  • Adaptive Frequence Modulation (AdaFM) for time step.

⚙️ Dependencies and Installation

git clone https://github.com/kunncheng/DiT-SR.git
cd DiT-SR

conda create -n DiT_SR python=3.10 -y
conda activate DiT_SR
pip install -r requirements.txt

🌈 Training

Datasets

The training data comprises LSDIR, DIV2K, DIV8K, OutdoorSceneTraining, Flicker2K and the first 10K face images from FFHQ. We saved all the image paths to txt files. For simplicity, you can also just use the LSDIR dataset.

Pre-trained Models

Several checkpoints should be downloaded to weights folder, including autoencoder and other pre-trained models for loss calculation.

Training Scripts

Real-world Image Super-resolution

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_DiT.yaml --save_dir ${save_dir}

Blind Face Restoration

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/faceir_DiT.yaml --save_dir ${save_dir}

🚀 Inference and Evaluation

Real-world Image Super-resolution

Real-world datasets: RealSR, RealSet65; Synthetic datasets: LSDIR-Test; Pretrained checkpoints.

bash test_realsr.sh

Blind Face Restoration

Real-world datasets: LFW, WebPhoto, Wider; Synthetic datasets: CelebA-HQ; Pretrained checkpoints.

bash test_faceir.sh

For the synthetic datasets (LSDIR-Test and CelebA-HQ), we are unable to release them due to corporate review restrictions. However, you can generate them yourself using these scripts.

🎓 Citiation

If you find our work useful in your research, please consider citing:

@misc{cheng2024ditsr,
      title={Effective Diffusion Transformer Architecture for Image Super-Resolution},
      author={Kun Cheng and Lei Yu and Zhijun Tu and Xiao He and Liyu Chen and Yong Guo and Mingrui Zhu and Nannan Wang and Xinbo Gao and Jie Hu},
      year={2024},
      eprint={2409.19589},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.19589}, 
}

❤️ Acknowledgement

We sincerely appreciate the code release of the following projects: ResShift, DiT, FFTFormer, SwinIR, SinSR, and BasicSR.

About

[AAAI 2025] Effective Diffusion Transformer Architecture for Image Super-Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published