RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Last update: Jan 04, 2023

Overview

RIFE - Real Time Video Interpolation

arXiv | YouTube | Colab | Tutorial | Demo

Introduction
Collection
Usage
Evaluation
Training and Reproduction
Citation
Reference
Sponsor

Introduction

This project is the implement of RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation. If you are a developer, welcome to follow Practical-RIFE, which aims to make RIFE more practical for users by adding various features and design new models.

Currently, our model can run 30+FPS for 2X 720p interpolation on a 2080Ti GPU. It supports 2X,4X,8X... interpolation, and multi-frame interpolation between a pair of images.

16X interpolation results from two input images:

Software

CLI Usage

Installation

git clone [email protected]:hzwer/arXiv2020-RIFE.git
cd arXiv2020-RIFE
pip3 install -r requirements.txt

Download the pretrained HD models from here. (百度网盘链接:https://pan.baidu.com/share/init?surl=u6Q7-i4Hu4Vx9_5BJibPPA 密码:hfk3，把压缩包解开后放在 train_log/*)
Unzip and move the pretrained parameters to train_log/*
This model is not reported by our paper, for our paper model please refer to evaluation.

Run

Video Frame Interpolation

You can use our demo video or your own video.

python3 inference_video.py --exp=1 --video=video.mp4

(generate video_2X_xxfps.mp4)

python3 inference_video.py --exp=2 --video=video.mp4

(for 4X interpolation)

python3 inference_video.py --exp=1 --video=video.mp4 --scale=0.5

(If your video has very high resolution such as 4K, we recommend set --scale=0.5 (default 1.0). If you generate disordered pattern on your videos, try set --scale=2.0. This parameter control the process resolution for optical flow model.)

python3 inference_video.py --exp=2 --img=input/

(to read video from pngs, like input/0.png ... input/612.png, ensure that the png names are numbers)

python3 inference_video.py --exp=2 --video=video.mp4 --fps=60

(add slomo effect, the audio will be removed)

python3 inference_video.py --video=video.mp4 --montage --png

(if you want to montage the origin video, skip static frames and save the png format output)

The warning info, 'Warning: Your video has *** static frames, it may change the duration of the generated video.' means that your video has changed the frame rate by adding static frames, it is common if you have processed 25FPS video to 30FPS.

Image Interpolation

python3 inference_img.py --img img0.png img1.png --exp=4

(2^4=16X interpolation results) After that, you can use pngs to generate mp4:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -c:v libx264 -pix_fmt yuv420p output/slomo.mp4 -q:v 0 -q:a 0

You can also use pngs to generate gif:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -vf "split[s0][s1];[s0]palettegen=stats_mode=single[p];[s1][p]paletteuse=new=1" output/slomo.gif

Run in docker

Place the pre-trained models in train_log/\*.pkl (as above)

Building the container:

docker build -t rife -f docker/Dockerfile .

Running the container:

docker run --rm -it -v $PWD:/host rife:latest inference_video --exp=1 --video=untitled.mp4 --output=untitled_rife.mp4

docker run --rm -it -v $PWD:/host rife:latest inference_img --img img0.png img1.png --exp=4

Using gpu acceleration (requires proper gpu drivers for docker):

docker run --rm -it --gpus all -v /dev/dri:/dev/dri -v $PWD:/host rife:latest inference_video --exp=1 --video=untitled.mp4 --output=untitled_rife.mp4

Evaluation

Download RIFE model reported by our paper.

UCF101: Download UCF101 dataset at ./UCF101/ucf101_interp_ours/

Vimeo90K: Download Vimeo90K dataset at ./vimeo_interp_test

MiddleBury: Download MiddleBury OTHER dataset at ./other-data and ./other-gt-interp

HD: Download HD dataset at ./HD_dataset. We also provide a google drive download link.

# RIFE
python3 benchmark/UCF101.py
# "PSNR: 35.282 SSIM: 0.9688"
python3 benchmark/Vimeo90K.py
# "PSNR: 35.615 SSIM: 0.9779"
python3 benchmark/MiddleBury_Other.py
# "IE: 1.956"
python3 benchmark/HD.py
# "PSNR: 32.14"
python3 benchmark/HD_multi.py
# "PSNR: 18.60(544*1280), 29.02(720p), 24.73(1080p)"

Training and Reproduction

Download Vimeo90K dataset.

We use 16 CPUs, 4 GPUs and 20G memory for training:

python3 -m torch.distributed.launch --nproc_per_node=4 train.py --world_size=4

Citation

@article{huang2020rife,
  title={RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation},
  author={Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang},
  journal={arXiv preprint arXiv:2011.06294},
  year={2020}
}

Reference

Optical Flow: ARFlow pytorch-liteflownet RAFT pytorch-PWCNet

Video Interpolation: DVF TOflow SepConv DAIN CAIN MEMC-Net SoftSplat BMBC EDSC EQVI

Sponsor

感谢支持 Paypal Sponsor: https://www.paypal.com/paypalme/hzwer

Comments

Welcome to try v3.8 model

Based on the evaluation of dozens of videos, the v3.8 model has achieved an acceleration effect of more than 2X while surpassing the effect of the RIFEv2.4 model. And v3.8 can better handle 2d scenes. At the same time, we welcome you to submit bad cases to help us in the future model improvement.

v3.8 model: https://github.com/hzwer/Practical-RIFE#model-list

opened by hzwer 23
24 to 60 fps?

Hi there,

RIFE looks fantastic, but as fair as I know I only can enter integer numbers as scale factor, correct? So when I want to interpolate 24 fps to 60 (by far the most common case I suppose) I know no other way than interpolating to 120 (factor 5) and then drop any other frame to get 60.

But even that doesn't seem to be possible as supported scale factors are only 2x, 4x, 8x (no 5x option).

So, is RIFE able to make 24 fps movie content run smooth on 60 Hz displays?

opened by spyro2000 22

can't train because torch incompatible with python version

/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/france1/arXiv2020-RIFE/train.py", line 140, in <module>
Traceback (most recent call last):
  File "/home/france1/arXiv2020-RIFE/train.py", line 140, in <module>
  File "/home/france1/arXiv2020-RIFE/train.py", line 140, in <module>
    torch.cuda.set_device(args.local_rank)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch.cuda.set_device(args.local_rank)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch.cuda.set_device(args.local_rank)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
    RuntimeErrortorch._C._cuda_setDevice(device): 
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 651166) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/france1/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
***************************************
            train.py FAILED            
=======================================
Root Cause:
[0]:
  time: 2021-09-30_17:35:27
  rank: 1 (local_rank: 1)
  exitcode: 1 (pid: 651166)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
=======================================
Other Failures:
[1]:
  time: 2021-09-30_17:35:27
  rank: 2 (local_rank: 2)
  exitcode: 1 (pid: 651167)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
[2]:
  time: 2021-09-30_17:35:27
  rank: 3 (local_rank: 3)
  exitcode: 1 (pid: 651168)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
***************************************

opened by arch-user-france1 19

Transparent PNG support

Seeing that recently EXR support was added, is it possible to support transparency (alpha channel) for PNG input and output (using --img --png) for inference_video.py?

This would enable interpolation of transparent GIFs.

opened by n00mkrad 19

Image sequence and input

     Thanks for adding the png output function. Can you make the output name to be consistent with ffmpeg ? i.e. 0000.png 0001.png ----- 7821.png.And then we can use ffmpeg to deal with image sequence.
     Adding image sequence input would also be great.

opened by Michaelwhite34 15

问题: Dataset有multiframes的时候，该如何prepare
你好，首先非常感谢在github上共享这个repo!

在用您release的model运行inference之后，我想尝试用customer dataset来训练。

我的dataset每一个video里面有24个frames, 所以目标就是生成中间的22个frames.

我参考了一下在dataset.py 中的 VimeoDataset class, 发现在prepare的data的时候，因为这个dataset每个video只有3个frames，所以return的都是第一个和最后一个frame，要求interpolate的是中间的frame.

想请问一下如果我想interpolate多个frames，是有可能实现的吗?

目前我已经开始训练了，我大概做了一个稍微的调整，就是input是第一个和最后一个frame，要求predict中间的那个frame (第十二个frame)，然后Model的选择，我选的是 RIFE.py 和 IFNet.py，对应的生成Model应该是最robust的(42.9MB). 因为我的数据集比较单一，为了防止overfitting，我先预先load了你们release的Model, 然后继续训练。

但是我发现loss在几十个epoch之后，出现了井喷的状态，最后生成的Model在做inference的时候，完全生成不了中间的22个frames(全部都是黑图), 跟我一开始用您release的Model运行的结果相差甚远...

后来我尝试用另一个Model训练(RIFE_HDV3.py 和 IFNet_HDV3.py)(生成Model是12.2MB), 但是pytorch一直报错。

-RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argumentfind_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel; (2) making sure allforwardfunction outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforwardfunction. Please include the loss function and the structure of the return value offorwardof your module when reporting this issue (e.g. list, dict, iterable). 错误源头是在RIFE_HDV3.py中的 update()里的 flow, mask, merged = self.flownet(imgs, scale=[4,2,1]) 我查了一下，出现这个错误原因是因为在 forward()的output里有些variables没有用来calculate loss. 我又去仔仔细细的查看了一下IFNet_HDV3下的forward() , 还是无果..

如果您有好的建议的话，不甚感谢!
opened by chenyuZha 13
Not the fastest for multi-frame interpolation

Hi,

Thanks for open sourcing the code and contributing to the video frame interpolation community.

In the paper, it mentioned: "Coupled with the large complexity in the bi-directional flow estimation, none of these methods can achieve real-time speed"

I believe that might be inappropriate to say, as the recent published paper (https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720103.pdf) targets efficient multi-frame interpolation.

It utilizes bi-directional flow estimation as well, but it generates 7 frames for 0.12 second. where your method requires 0.036 * 7 = 0.252 seconds.

And the model from that paper is compact, which consists of only ~2M parameters, where your fast model has ~10M parameters.

opened by mrchizx 13
关于hdv2和hdv3模型的复现

您好，我复现了hdv2和hdv3模型，但是和您提供的结果总有一些差距。配置超参数： weight_decay=1e-4, learing rate:3e-4 *mul ，vgg loss，还有 #146 中提到的数据增广方式，patch size 256，训练300epoch。我看您在hdv2和v3提供的版本中，模型结构都没有变化，它们都有什么区别呢？除了我上面的配置，想要达到您的效果还有哪些没有做到呢？

opened by tqyunwuxin 12
Model v2 update log

We show some hard case results for every version model. v2 google drive download link: (https://drive.google.com/file/d/1wsQIhHZ3Eg4_AfCXItFKqqyDMB4NS0Yd/view).

v1.1 2020.11.16 链接:https://pan.baidu.com/s/1SPRw_u3zjaufn7egMr19Eg 密码:orkd

opened by hzwer 12
Training with other datasets

Has anyone trained RIFE_HDv2 with training set other than vimeo dataset: such as HD dataset. I

And were they able to get better visual quality for HD content.

opened by rsjjdesj 11
replicating benchmarks

Thank you for sharing your code! I was trying to replicate the numbers you stated in your paper using this implementation but have unfortunately been unsuccessful so far. Would you be able to share a script that can be used to replicate the Vimeo-90k metrics you quoted? Also, I think the following padding has some issues.

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L26-L28

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L45

The pw - w and [:h, :w] indicate that pw > w (and ph > h). However, pw = 340 // 32 * 32 = 320 for w = 340 which violates this condition. Thanks for looking into this and thanks again for sharing your code!

opened by sniklaus 11
Reproducibility results

Hi,

I checked if I can reproduce the results similar to those in the paper to make sure I am training the model properly. These are the results that I got on Vimeo triplets:

The model prediction is shown for t=1 (predicting between t=0 and t=2), the second row corresponds to interpolation ("Interpol"), and the last row to flow ("Flow pred"). I see that interpolation results are very good, however, I expected the flow to be a bit more accurate. In section 6.2 of the appendix you mention that "IFNet produces clear motion boundaries.", this is also what can be seen in Figure 10. Therefore I wanted to ask if there were any other training steps that I need to add to get flow prediction more accurate. I can of course share more prediction examples that I got.

The training was done for 300 epochs using the reconstruction losses and distillation loss (the latter is with coeff. 0.01) as described in the paper, I didn't change anything in the code to train and obtain these results. This is the loss plot:

I would appreciate if you can confirm that you trained the model the same way and that your flow predictions look similar. I used IFNet for training (self.flownet = IFNet()).

opened by HamidGadirov 4
How to visualize flow that model inferenced ?

flow, mask, merged = self.flownet(imgs, scale_list) flow must be the flow between two images. it's shape (bs, 4, H, W) , how to visualize it ? like this: and how to generate the flow groudtruth ? Thx !

opened by zhishao 12

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 1

hi, im trying to interpolate a png sequence. they are properly numbered and such. here is my console:

conda run -n RIFE py D:\Development\RIFE\inference_video.py --img "D:\Game Assets\Super Outrun Rush\Animations\Pixelated\WadeCharge" --exp=4
Loaded v3.x HD model.

  0%|          | 0/16 [00:00<?, ?it/s]Traceback (most recent call last):
  File "D:\Development\RIFE\inference_video.py", line 259, in <module>
    output = make_inference(I0, I1, 2**args.exp-1) if args.exp else []
  File "D:\Development\RIFE\inference_video.py", line 180, in make_inference
    middle = model.inference(I0, I1, args.scale)
  File "D:\Development\RIFE\train_log\RIFE_HDv3.py", line 58, in inference
    flow, mask, merged = self.flownet(imgs, scale_list)
  File "C:\Users\Jackson\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Development\RIFE\train_log\IFNet_HDv3.py", line 113, in forward
    merged[i] = merged[i][0] * mask_list[i] + merged[i][1] * (1 - mask_list[i])
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 1

opened by Apple-Fritter-Money-Entertainment 0

Releases(arxiv_v5_code)

arxiv_v5_code(Aug 13, 2021)

https://arxiv.org/abs/2011.06294v5
Source code(tar.gz)
Source code(zip)

Owner

hzwer

GitHub Repository

Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"

Self-Supervised Prototypical Transfer Learning for Few-Shot Classification This repository contains the reference source code and pre-trained models (

44 Nov 04, 2022

Datasets and pretrained Models for StyleGAN3 ...

Datasets and pretrained Models for StyleGAN3 ... Dear arfiticial friend, this is a collection of artistic datasets and models that we have put togethe

34 Oct 06, 2022

Learned Initializations for Optimizing Coordinate-Based Neural Representations

Learned Initializations for Optimizing Coordinate-Based Neural Representations Project Page | Paper Matthew Tancik*1, Ben Mildenhall*1, Terrance Wang1

127 Jan 03, 2023

QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

56 Dec 30, 2022

A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

733 Dec 29, 2022

Boundary-aware Transformers for Skin Lesion Segmentation

Boundary-aware Transformers for Skin Lesion Segmentation Introduction This is an official release of the paper Boundary-aware Transformers for Skin Le

79 Dec 16, 2022

Deep Learning Training Scripts With Python

Deep Learning Training Scripts DNN Frameworks Caffe PyTorch Tensorflow CNN Models VGG ResNet DenseNet Inception Language Modeling GatedCNN-LM Attentio

16 Dec 15, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Related tags

Overview

RIFE - Real Time Video Interpolation

arXiv | YouTube | Colab | Tutorial | Demo

Table of Contents

Introduction

Software

CLI Usage

Installation

Run

Run in docker

Evaluation

Training and Reproduction

Citation

Reference

Sponsor

Comments

Releases(arxiv_v5_code)

arxiv_v5_code(Aug 13, 2021)

Owner

hzwer

Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"

Datasets and pretrained Models for StyleGAN3 ...

Learned Initializations for Optimizing Coordinate-Based Neural Representations

QT Py Media Knob using rotary encoder & neopixel ring

A generalist algorithm for cell and nucleus segmentation.

Boundary-aware Transformers for Skin Lesion Segmentation

Deep Learning Training Scripts With Python

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Flexible-Modal Face Anti-Spoofing: A Benchmark

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A Tensorflow implementation of BicycleGAN.

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Tools for investing in Python

Anonymize BLM Protest Images

Gray Zone Assessment

SOLOv2 on onnx & tensorRT