YouTube | Poster | Enhancement Model | demo | 中文介绍
We want to increase video resolution and frame rates end-to-end (end-to-end STVSR). This project is the implement of Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution. Our SAFA network outperforms recent state-of-the-art methods such as TMNet and VideoINR by an average improvement of over 0.5dB on PSNR, while requiring less than half the number of parameters and only 1/3 computational costs.
We have released some dedicated visual effect models for ordinary users. Some insights on multi-scale processing and feature fusion are reflected in RIFE applications, see Practical-RIFE.
Space-Time Super-Resolution:
![image](https://private-user-images.githubusercontent.com/10103856/278423362-a243c9e2-243e-4ce6-a5c0-3739d98eb22c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyOTEyMjEsIm5iZiI6MTczOTI5MDkyMSwicGF0aCI6Ii8xMDEwMzg1Ni8yNzg0MjMzNjItYTI0M2M5ZTItMjQzZS00Y2U2LWE1YzAtMzczOWQ5OGViMjJjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE2MjIwMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRhMjlkMjg0YTI2NWU2NWJjNzMzYjM5M2FhYTFjYjQxYzc4YzFiODg0NjUxNzY2MGQxN2EwMjYxYjBkNDFiN2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.DIPto273k9PVZWqRcpo2veGfy0ktivqeP8qYqM24xBQ)
git clone git@github.com:megvii-research/WACV2024-SAFA.git
cd WACV2024-SAFA
pip3 install -r requirements.txt
Download the pretrained model from Google Drive.
Image Interpolation
python3 inference_img.py --img demo/i0.png demo/i1.png --exp=3
(2^3=8X interpolation results)
python3 inference_img.py --img demo/i0.png demo/i1.png --ratio=0.4
(for an arbitrary timestep)
We use 16 CPUs, 4 GPUs for training:
python3 -m torch.distributed.launch --nproc_per_node=4 train.py --world_size=4
The training scheme is mainly adopted from RIFE.
We sincerely recommend some related papers:
ECCV22 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
CVPR23 - A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
If you think this project is helpful, please feel free to leave a star or cite our paper:
@inproceedings{huang2024safa,
title={Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution},
author={Huang, Zhewei and Huang, Ailin and Hu, Xiaotao and Hu, Chen and Xu, Jun and Zhou, Shuchang},
booktitle={Winter Conference on Applications of Computer Vision (WACV)},
year={2024}
}