Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Last update: Dec 04, 2022

Related tags

Deep Learning CDFI

Overview

CDFI (Compression-Driven-Frame-Interpolation)

[Paper] (Coming soon...) | [arXiv]

Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Introduction

We propose a Compression-Driven network design for Frame Interpolation (CDFI), that leverages model compression to significantly reduce the model size (allows a better understanding of the current architecture) while making room for further improvements and achieving superior performance in the end. Concretely, we first compress AdaCoF and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we improve upon this compressed model with simple modifications. Note that typically it is prohibitive to implement the same improvements on the original heavy model.

We achieve a significant performance gain with only a quarter in size compared with the original AdaCoF

	Vimeo-90K	Middlebury	UCF101-DVF	#Params
	PSNR, SSIM, LPIPS	PSNR, SSIM, LPIPS	PSNR, SSIM, LPIPS
AdaCoF	34.38, 0.974, 0.019	35.74, 0.979, 0.019	35.20, 0.967, 0.019	21.8M
Compressed AdaCoF	34.15, 0.973, 0.020	35.46, 0.978, 0.019	35.14, 0.967, 0.019	2.45M
AdaCoF+	34.58, 0.975, 0.018	36.12, 0.981, 0.017	35.19, 0.967, 0.019	22.9M
Compressed AdaCoF+	34.46, 0.975, 0.019	35.76, 0.979, 0.019	35.16, 0.967, 0.019	2.56M
Our Final Model	35.19, 0.978, 0.010	37.17, 0.983, 0.008	35.24, 0.967, 0.015	4.98M

Our final model also performs favorably against other state-of-the-arts (details refer to our paper)
The proposed framework is generic and can be easily transferred to other DNN-based frame interpolation method

The above GIF is a demo of using our method to generate slow motion video, which increases the FPS from 5 to 160. We also provide a long video demonstration here (redirect to YouTube).

Environment

CUDA 11.0
python 3.8.3
torch 1.6.0
torchvision 0.7.0
cupy 7.7.0
scipy 1.5.2
numpy 1.19.1
Pillow 7.2.0
scikit-image 0.17.2

Test Pre-trained Models

Download repository:

$ git clone https://github.com/tding1/CDFI.git
$ cd CDFI/

Testing data

For user convenience, we already provide the Middlebury and UCF101-DVF test datasets in our repository, which can be found under directory test_data/.

Evaluation metrics

We use the built-in functions in skimage.metrics to compute the PSNR and SSIM, for which the higher the better. We also use LPIPS, a newly proposed metric that measures perceptual similarity, for which the smaller the better. For user convenience, we include the implementation of LPIPS in our repo under lpips_pytorch/, which is a slightly modified version of here (with an updated squeezenet backbone).

Test our pre-trained CDFI model

$ python test.py --gpu_id 0

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/cdfi_adacof/.

Test the compressed AdaCoF

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 5 --dilation 1

By default, it will load the compressed AdaCoF model checkpoints/compressed_adacof_F_5_D_1.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_5_D_1/.

Test the compressed AdaCoF+

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 11 --dilation 2

By default, it will load the compressed AdaCoF+ model checkpoints/compressed_adacof_F_11_D_2.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_11_D_2/.

Interpolate two frames

$ python interpolate_twoframe.py --gpu_id 0 --first_frame figs/0.png --second_frame figs/1.png --output_frame output.png

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth, and generate the intermediate frame output.png given two consecutive frames in a sequence.

Train Our Model

Training data

We use the Vimeo-90K triplet dataset for video frame interpolation task, which is relatively large (>32 GB).

$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Start training

$ python train.py --gpu_id 0 --data_dir path/to/vimeo_triplet/ --batch_size 8

It will generate an unique ID for each training, and all the intermediate results/records will be saved under model_weights/<training id>/. For a GPU device with memory around 10GB, the --batch_size can take a value as large as 3, otherwise CUDA may be out of memory. There are many other training options, e.g., --lr, --epochs, --loss and so on, can be found in train.py.

Apply CDFI to New Models

One nice thing about CDFI is that the framework can be easily applied to other (heavy) DNN models and potentially boost their performance. The key to CDFI is the optimization-based compression that compresses a model via fine-grained pruning. In particular, we use the efficient and easy-to-use sparsity-inducing optimizer OBPROXSG (see also paper), and summarize the compression procedure for any other model in the following.

Copy the OBPROXSG optimizer, which is already implemented as torch.optim.optimizer, to your working directory
Starting from a pre-trained model, finetune its weights by using the OBPROXSG optimizer, like using any standard PyTorch built-in optimizer such as SGD or Adam
- It is not necessarily to use the full dataset for this finetuning process
The parameters for the OBPROXSG optimizer
- lr: learning rate
- lambda_: coefficient of the L1 regularization term
- epochSize: number of batches in a epoch
- Np: number of proximal steps, which is set to be 2 for pruning AdaCoF
- No: number of orthant steps (key step to promote sparsity), for which we recommend using the default setting
- eps: threshold for trimming zeros, which is set to be 0.0001 for pruning AdaCoF
After the optimization is done (either by reaching a maximum number of epochs or achieving a high sparsity), use the layer density as the compression ratio for that layer, as described in the paper
As an example, compare the architectures in models/adacof.py and model/compressed_adacof.py for compressing AdaCoF with the above procedure

Now it's ready to make further improvements/modifications on the compressed model, based on the understanding of its flaws/drawbacks.

Citation

Coming soon...

Acknowledgements

The code is largely based on HyeongminLEE/AdaCoF-pytorch and baowenbo/DAIN.

Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Related tags

Overview

CDFI (Compression-Driven-Frame-Interpolation)

Introduction

Environment

Test Pre-trained Models

Testing data

Evaluation metrics

Test our pre-trained CDFI model

Test the compressed AdaCoF

Test the compressed AdaCoF+

Interpolate two frames

Train Our Model

Training data

Start training

Apply CDFI to New Models

Citation

Acknowledgements

Owner

Tianyu Ding

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Synthetic LiDAR sequential point cloud dataset with point-wise annotations

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

Pytorch implementation of VAEs for heterogeneous likelihoods.

Title: Heart-Failure-Classification

Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

Food recognition model using convolutional neural network & computer vision

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Official repository of DeMFI (arXiv.)

CL-Gym: Full-Featured PyTorch Library for Continual Learning

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

Multi-task head pose estimation in-the-wild

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.