Yixuan Zhu*, Wenliang Zhao*
$\dagger$ , Ao Li, Yansong Tang, Jie Zhou, Jiwen Lu$\ddagger$ * Equal contribution
$\dagger$ Project leader$\ddagger$ Corresponding author
The repository contains the official implementation for the paper "FlowIE: Efficient Image Enhancement via Rectified Flow" (CVPR 2024, oral presentation).
FlowIE is a simple yet highly effective Flow-based Image Enhancement framework that estimates straight-line paths from an elementary distribution to high-quality images.
- Release model and inference code.
- Release code for training dataloader.
We recommend you to use an Anaconda virtual environment. If you have installed Anaconda, run the following commands to create and activate a virtual environment.
conda env create -f requirements.txt
conda activate FlowIE
Since we use MemoryEfficientCrossAttention
to accelerate the inference process, we need to slightly modify the lora.py
in lora_diffusion package, which could be done in 2 minutes:
- (1) Locate the
lora.py
file in the package directory. You can easily find this file by using the "go to definition" button in Line 4 of the./model/cldm.py
file. - (2) Make the following modifications to Lines 159-161 in
lora.py
:
Original Code:
UNET_DEFAULT_TARGET_REPLACE = {"CrossAttention", "Attention", "GEGLU"}
UNET_EXTENDED_TARGET_REPLACE = {"ResnetBlock2D", "CrossAttention", "Attention", "GEGLU"}
Modified Code:
UNET_DEFAULT_TARGET_REPLACE = {"CrossAttention", "Attention", "GEGLU", "MemoryEfficientCrossAttention"}
UNET_EXTENDED_TARGET_REPLACE = {"ResnetBlock2D", "CrossAttention", "Attention", "GEGLU", "MemoryEfficientCrossAttention", "ResBlock"}
We prepare the data in a samilar way as GFPGAN & DiffBIR. We list the datasets for BFR and BSR as follows:
For BFR evaluation, please refer to here for BFR-test datasets, which include CelebA-Test, CelebChild-Test and LFW-Test. The WIDER-Test can be found in here. For BFR training, please download the FFHQ dataset.
For BSR, we utilize ImageNet for training. For evaluation, you can refer to BSRGAN for RealSRSet.
To prepare the training list, you need to simply run the script:
python ./scripts/make_file_list.py --img_folder /data/ILSVRC2012 --save_folder ./dataset/list/imagenet
python ./scripts/make_file_list.py --img_folder /data/FFHQ --save_folder ./dataset/list/ffhq
The file list looks like this:
/path/to/image_1.png
/path/to/image_2.png
/path/to/image_3.png
...
Please download our pretrained checkpoints from this link and put them under ./weights
. The file directory should be:
|-- checkpoints
|--|-- FlowIE_bfr_v1.ckpt
|--|-- FlowIE_bsr_v1.ckpt
...
You can test FlowIE with following commands:
- Evaluation for BFR
python inference_bfr.py --ckpt ./weights/FlowIE_bfr_v1.ckpt --has_aligned --input /data/celeba_512_validation_lq/ --output ./outputs/bfr_exp --has_aligned
- Evaluation for BSR
python inference_bsr.py --ckpt ./weights/FlowIE_bsr_v1.ckpt --input /data/testdata/ --output ./outputs/bsr_exp --sr_scale 4
- Quick Test
For a quick test, we collect some test samples in ./assets
. You can run the demo for BFR:
python inference_bfr.py --ckpt ./weights/FlowIE_bfr_v1.ckpt --input ./assets/faces --output ./outputs/demo
And for BSR:
python inference_bsr.py --ckpt ./weights/FlowIE_bsr_v1.pth --input ./assets/real-photos/ --output ./outputs/bsr_exp --tiled --sr_scale 4
You can use --tiled
for patch-based inference and use --sr_scale
tp set the super-resolution scale, like 2 or 4. You can set CUDA_VISIBLE_DEVICES=1
to choose the devices.
The evaluation process can be done with one Nvidia GeForce RTX 3090 GPU (24GB VRAM). You can use more GPUs by specifying the GPU ids.
The key component in FlowIE is a path estimator tuned from Stable Diffusion v2.1 base. Please download it to ./weights
. Another part is the initial module, which can be found in checkpoints.
Before training, you also need to configure training-related information in ./configs/train_cldm.yaml
. Then run this command to start training:
python train.py --config ./configs/train_cldm.yaml
We would like to express our sincere thanks to the author of DiffBIR for the clear code base and quick response to our issues.
We also thank CodeFormer, Real-ESRGAN and LoRA, for our code is partially borrowing from them.
The new version of FlowIE based on Denoising Transformer (DiT) structure will be released soon! Thanks the newest works of DiTs, including PixART and Stable Diffusion 3.
Please cite us if our work is useful for your research.
@misc{zhu2024flowie,
title={FlowIE: Efficient Image Enhancement via Rectified Flow},
author={Yixuan Zhu and Wenliang Zhao and Ao Li and Yansong Tang and Jie Zhou and Jiwen Lu},
year={2024},
eprint={2406.00508},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This code is distributed under an MIT LICENSE.