Skip to content
/ PiSA-SR Public

[CVPR 2025] Official code repository for "Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach"

Notifications You must be signed in to change notification settings

csslc/PiSA-SR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

🚩 Accepted by CVPR2025

Lingchen Sun1,2 | Rongyuan Wu1,2 | Zhiyuan Ma1 | Shuaizheng Liu1,2 | Qiaosi Yi1,2 | Lei Zhang1,2

1The Hong Kong Polytechnic University, 2OPPO Research Institute

⏰ Update

  • 2025.3.25: Training code is released.
  • 2025.1.2: Code and models are released.
  • 2024.12.4: The paper and this repo are released.

⭐ If PiSA-SR is helpful to your images or projects, please help star this repo. Thanks! 🤗

🌟 Overview Framework

PiSA-SR

(a) Training procedure of PiSA-SR. During the training process, two LoRA modules are respectively optimized for pixel-level and semantic-level enhancement.

(b) Inference procedure of PiSA-SR. During the inference stage, users can use the default setting to reconstruct the high-quality image in one-step diffusion or adjust λpix and λsem to control the strengths of pixel-level and semantic-level enhancement.

😍 Visual Results

Demo on Real-world SR

Demo on AIGC Enhancement

Adjustable SR Results

PiSA-SR

By increasing the guidance scale λpix on the pixel-level LoRA module, the image degradations such as noise and compression artifacts can be gradually removed; however, a too-strong λpix will make the SR image over-smoothed. By increasing the guidance scale λsem on the semantic-level LoRA module, the SR images will have more semantic details; nonetheless, a too-high λsem will generate visual artifacts.

Comparisons with Other DM-Based SR Methods

PiSA-SR

⚙ Dependencies and Installation

## git clone this repository
git clone https://github.com/csslc/PiSA-SR
cd PiSA-SR


# create an environment
conda create -n PiSA-SR python=3.10
conda activate PiSA-SR
pip install --upgrade pip
pip install -r requirements.txt

🍭 Quick Inference

Step 1: Download the pretrained models

Step 2: Prepare testing data

You can put the testing images in the preset/test_datasets.

Step 3: Running testing command

For default setting:

python test_pisasr.py \
--pretrained_model_path preset/models/stable-diffusion-2-1-base \
--pretrained_path preset/models/pisa_sr.pkl \
--process_size 512 \
--upscale 4 \
--input_image preset/test_datasets \
--output_dir experiments/test \
--default

For adjustable setting:

python test_pisasr.py \
--pretrained_model_path preset/models/stable-diffusion-2-1-base \
--pretrained_path preset/models/pisa_sr.pkl \
--process_size 512 \
--upscale 4 \
--input_image preset/test_datasets \
--output_dir experiments/test \
--lambda_pix 1.0 \
--lambda_sem 1.0

🛠️You can adjust lambda_pix and lambda_sem to control the strengths of pixel-wise fidelity and semantic-level details.

We integrate tile_diffusion and tile_vae to the test_pisasr.py to save the GPU memory for inference. You can change the tile size and stride according to the VRAM of your device.

python test_pisasr.py \
--pretrained_model_path preset/models/stable-diffusion-2-1-base \
--pretrained_path preset/models/pisa_sr.pkl \
--process_size 512 \
--upscale 4 \
--input_image preset/test_datasets \
--output_dir experiments/test \
--latent_tiled_size 96 \
--latent_tiled_overlap 32 \
--vae_encoder_tiled_size 1024 \
--vae_decoder_tiled_size 224 \
--default

🚋 Train

Step1: Prepare training data

Generate txt file for the training set. Fill in the required information in get_path and run, then you can obtain the txt file recording the paths of ground-truth images. You can save the txt file into preset/gt_path.txt. The high-quality ground-truth images can be selected from your training dataset, and the txt file can be saved in preset/gt_selected_path.

Step2: Train Model

  1. Download pretrained Stable Diffusion v2.1 to provide generative capabilities.

    wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt --no-check-certificate
  2. Download RAM model for extracting text prompt, and put the model into src/ram_pretrain_model.

  3. Start training.

    CUDA_VISIBLE_DEVICES="0,1,2,3," accelerate launch train_pisasr.py \
    --pretrained_model_path="preset/models/stable-diffusion-2-1-base" \
    --pretrained_model_path_csd="preset/models/stable-diffusion-2-1-base" \
    --dataset_txt_paths="preset/gt_path.txt" \
    --highquality_dataset_txt_paths="preset/gt_selected_path.txt" \
    --dataset_test_folder="preset/testfolder" \
    --learning_rate=5e-5 \
    --train_batch_size=4 \
    --prob=0.1 \
    --gradient_accumulation_steps=1 \
    --enable_xformers_memory_efficient_attention --checkpointing_steps 500 \
    --seed 123 \
    --output_dir="experiments/train-pisasr" \
    --cfg_csd 7.5 \
    --timesteps1 1 \
    --lambda_lpips=2.0 \
    --lambda_l2=1.0 \
    --lambda_csd=1.0 \
    --pix_steps=4000 \
    --lora_rank_unet_pix=4 \
    --lora_rank_unet_sem=4 \
    --min_dm_step_ratio=0.02 \
    --max_dm_step_ratio=0.5 \
    --null_text_ratio=0.5 \
    --align_method="adain" \
    --deg_file_path="params.yml" \
    --tracker_project_name "PiSASR" \
    --is_module True

Citations

If our code helps your research or work, please consider citing our paper. The following are BibTeX references:

@article{sun2024pisasr,
  title={Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach},
  author={Sun, Lingchen and Wu, Rongyuan and Ma, Zhiyuan and Liu, Shuaizheng and Yi, Qiaosi and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2025}
}

License

This project is released under the Apache 2.0 license.

Acknowledgement

This project is based on OSEDiff. Thanks for the awesome work.

Contact

If you have any questions, please contact: ling-chen.sun@connect.polyu.hk

statistics

visitors

About

[CVPR 2025] Official code repository for "Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published