The official PyTorch implementation of IFRNet (CVPR 2022).
Authors: Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, Jie Yang
Existing flow-based frame interpolation methods almost all first estimate or model intermediate optical flow, and then use flow warped context features to synthesize target frame. However, they ignore the mutual promotion of intermediate optical flow and intermediate context feature. Also, their cascaded architecture can substantially increase the inference delay and model parameters, blocking them from lots of mobile and real-time applications. For the first time, we merge above separated flow estimation and context feature refinement into a single encoder-decoder based IFRNet for compactness and fast inference, where these two crucial elements can benefit from each other. Moreover, task-oriented flow distillation loss and feature space geometry consistency loss are newly proposed to promote intermediate motion estimation and intermediate feature reconstruction of IFRNet, respectively. Benchmark results demonstrate that our IFRNet not only achieves state-of-the-art VFI accuracy, but also enjoys fast inference speed and lightweight model size.
[4K60p] うたわれるもの 偽りの仮面 OP フレーム補間+超解像 (IFRnetとReal-CUGAN)
[4K60p] 天神乱漫 -LUCKY or UNLUCKY!?- OP (IFRnetとReal-CUGAN)
- PyTorch >= 1.3.0 (We have verified that this repository supports Python 3.6/3.7, PyTorch 1.3.0/1.9.1).
- Download training and test datasets: Vimeo90K, UCF101, SNU-FILM, Middlebury, GoPro and Adobe240.
- Set the right dataset path on your machine.
Figures from left to right are overlaid input frames, 2x and 8x video interpolation results respectively.
-
Download our pre-trained models in this link, and then put file
checkpoints
into the root dir. -
Run the following scripts to generate 2x and 8x frame interpolation demos
$ python demo_2x.py
$ python demo_8x.py
- First, run this script to generate optical flow pseudo label
$ python generate_flow.py
- Then, start training by executing one of the following commands with selected model
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_L' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_vimeo90k.py --world_size 4 --model_name 'IFRNet_S' --epochs 300 --batch_size 6 --lr_start 1e-4 --lr_end 1e-5
To test running time and model parameters, you can run
$ python benchmarks/speed_parameters.py
To test frame interpolation accuracy on Vimeo90K, UCF101 and SNU-FILM datasets, you can run
$ python benchmarks/Vimeo90K.py
$ python benchmarks/UCF101.py
$ python benchmarks/SNU_FILM.py
Proposed IFRNet achieves state-of-the-art frame interpolation accuracy with less inference time and computation complexity. We expect proposed single encoder-decoder joint refinement based IFRNet to be a useful component for many frame rate up-conversion, video compression and intermediate view synthesis systems. Time and FLOPs are measured on 1280 x 720 resolution.
Video comparison for 2x interpolation of methods using 2 input frames on SNU-FILM dataset.
Results on the Middlebury online benchmark.
Results on the Middlebury Other dataset.
- Start training by executing one of the following commands with selected model
$ python -m torch.distributed.launch --nproc_per_node=4 train_gopro.py --world_size 4 --model_name 'IFRNet' --epochs 600 --batch_size 2 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_gopro.py --world_size 4 --model_name 'IFRNet_L' --epochs 600 --batch_size 2 --lr_start 1e-4 --lr_end 1e-5
$ python -m torch.distributed.launch --nproc_per_node=4 train_gopro.py --world_size 4 --model_name 'IFRNet_S' --epochs 600 --batch_size 2 --lr_start 1e-4 --lr_end 1e-5
Since inter-frame motion in 8x interpolation setting is relatively small, task-oriented flow distillation loss is omitted here. Due to the GoPro training set is a relatively small dataset, we suggest to use your specific datasets to train slow-motion generation for better results.
Each video has 9 frames, where the first and the last frames are input, and the middle 7 frames are predicted by IFRNet.
ifrnet-ncnn-vulkan uses ncnn project as the universal neural network inference framework. This package includes all the binaries and models required. It is portable, so no CUDA or PyTorch runtime environment is needed.
When using any parts of the Software or the Paper in your work, please cite the following paper:
@InProceedings{Kong_2022_CVPR,
author = {Kong, Lingtong and Jiang, Boyuan and Luo, Donghao and Chu, Wenqing and Huang, Xiaoming and Tai, Ying and Wang, Chengjie and Yang, Jie},
title = {IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}