Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Overview

Spatio-Temporal Entropy Model

A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

More details can be found in the following paper:

Spatiotemporal Entropy Model is All You Need for Learned Video Compression
Alibaba Group, arxiv 2021.4.13
Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, Hao Li

Note that It Is Not An Official Implementation Code.

The differences with the original paper are not limited to the following:

  • The number of model channels are fewer.
  • The Encoder/Decoder in original paper consists of conditional conv1 to support various rate in one single model. And the architecture is the same as [2]2. However, I only use the single rate Encoder/Decoder with the same architecture as [2]2

ToDo:

  • 1. various rate model training and evaluation.

Environment

  • Python == 3.7.10
  • Pytorch == 1.7.1
  • CompressAI

Dataset

I use the Vimeo90k Septuplet Dataset to train the models. The Dataset contains about 64612 training sequences and 7824 testing sequences. All sequence contains 7 frames.

The train dataset folder structure is as

.dataset/vimeo_septuplet/
│  sep_testlist.txt
│  sep_trainlist.txt
│  vimeo_septuplet.txt
│  
├─sequences
│  ├─00001
│  │  ├─0001
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
│  │  ├─0002
│  │  │      f001.png
│  │  │      f002.png
│  │  │      f003.png
│  │  │      f004.png
│  │  │      f005.png
│  │  │      f006.png
│  │  │      f007.png
...

I evaluate the model on UVG & HEVC TEST SEQUENCE Dataset. The test dataset folder structure is as

.dataset/UVG/
├─PNG
│  ├─Beauty
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │      
│  ├─HoneyBee
│  │      f001.png
│  │      f002.png
│  │      f003.png
│  │      ...
│  │      f598.png
│  │      f599.png
│  │      f600.png
│  │     
│  │      ...
.dataset/HEVC/
├─BasketballDrill
│      f001.png
│      f002.png
│      f003.png
│      ...
│      f098.png
│      f099.png
│      f100.png
│      
├─BasketballDrive
│      f001.png
│      f002.png
│      ...

Train Your Own Model

python3 trainSTEM.py -d /path/to/your/image/dataset/vimeo_septuplet --lambda 0.01 -lr 1e-4 --batch-size 16 --model-save /path/to/your/model/save/dir --cuda --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar

I tried to train with Mean-Scale Hyperprior / Joint Autoregressive Hierarchical Priors / Cheng2020Attn in CompressAI library and find that a powerful I Frame Compressor does have great performance benefits.

Evaluate Your Own Model

python3 evalSTEM.py --checkpoint /path/to/your/iframecompressor/checkpoint.pth.tar --entropy-model-path /path/to/your/stem/checkpoint.pth.tar

Currently only support evaluation on UVG & HEVC TEST SEQUENCE Dataset.

Result

测试数据集UVG PSNR BPP PSNR in paper BPP in paper
SpatioTemporalPriorModel_Res 36.104 0.087 35.95 0.080
SpatioTemporalPriorModel 36.053 0.080 35.95 0.082
SpatioTemporalPriorModelWithoutTPM None None 35.95 0.100
SpatioTemporalPriorModelWithoutSPM 36.066 0.080 35.95 0.087
SpatioTemporalPriorModelWithoutSPMTPM 36.021 0.141 35.95 0.123

PSNR in paper & BPP in paper is estimated from Figure 6 in the original paper.

It seems that the context model SPM has no good effect in my experiments.

I look forward to receiving more feedback on the test results, and feel free to share your test results!

More Informations About Various Rate Model Training

As stated in the original paper, they use a variable-rate auto-encoder to support various rate in one single model. I tried to train STEM with GainedVAE, which is also a various rate model. Some point can achieve comparable r-d performance while others may degrade. What's more, the interpolation result could have more performance degradation cases.

Probably we need Loss Modulator3 for various rate model training. Read Oren Ripple's ICCV 2021 paper3 for more details.

Acknowledgement

The framework is based on CompressAI, I add the model in compressai.models.spatiotemporalpriors. And trainSTEM.py/evalSTEM.py is modified with reference to compressai_examples

Reference

[1] [Variable Rate Deep Image Compression With a Conditional Autoencoder](https://openaccess.thecvf.com/content_ICCV_2019/html/Choi_Variable_Rate_Deep_Image_Compression_With_a_Conditional_Autoencoder_ICCV_2019_paper.html)
[2] [Joint Autoregressive and Hierarchical Priors for Learned Image Compression](https://arxiv.org/abs/1809.02736)
[3] [ELF-VC Efficient Learned Flexible-Rate Video Coding](https://arxiv.org/abs/2104.14335)

Contact

Feel free to contact me if there is any question about the code or to discuss any problems with image and video compression. ([email protected])

Unified file system operation experience for different backend

megfile - Megvii FILE library Docs: http://megvii-research.github.io/megfile megfile provides a silky operation experience with different backends (cu

MEGVII Research 76 Dec 14, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore

[AI6122] Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instructor of this course

HT. Li 5 Sep 12, 2022
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a building extraction plugin of QGIS based on PaddlePaddle. TODO Extract building on 512x512 remote sensing images. Extract build

Yizhou Chen 11 Sep 26, 2022
This repository lets you interact with Lean through a REPL.

lean-gym This repository lets you interact with Lean through a REPL. See Formal Mathematics Statement Curriculum Learning for a presentation of lean-g

OpenAI 87 Dec 28, 2022
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

149 Dec 15, 2022
Data for "Driving the Herd: Search Engines as Content Influencers" paper

herding_data Data for "Driving the Herd: Search Engines as Content Influencers" paper Dataset description The collection contains 2250 documents, 30 i

0 Aug 17, 2021
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 03, 2023
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

NeRF-pytorch NeRF (Neural Radiance Fields) is a method that achieves state-of-the-art results for synthesizing novel views of complex scenes. Here are

Yen-Chen Lin 3.2k Jan 08, 2023
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) 🔥 If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 04, 2023
Pun Detection and Location

Pun Detection and Location “The Boating Store Had Its Best Sail Ever”: Pronunciation-attentive Contextualized Pun Recognition Yichao Zhou, Jyun-yu Jia

lawson 3 May 13, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
Gans-in-action - Companion repository to GANs in Action: Deep learning with Generative Adversarial Networks

GANs in Action by Jakub Langr and Vladimir Bok List of available code: Chapter 2: Colab, Notebook Chapter 3: Notebook Chapter 4: Notebook Chapter 6: C

GANs in Action 914 Dec 21, 2022
🛰️ List of earth observation companies and job sites

Earth Observation Companies & Jobs source Portals & Jobs Geospatial Geospatial jobs newsletter: ~biweekly newsletter with geospatial jobs by Ali Ahmad

Dahn 64 Dec 27, 2022
Implementation of Convolutional LSTM in PyTorch.

ConvLSTM_pytorch This file contains the implementation of Convolutional LSTM in PyTorch made by me and DavideA. We started from this implementation an

Andrea Palazzi 1.3k Dec 29, 2022
Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Isaac ROS Visual Odometry This repository provides a ROS2 package that estimates stereo visual inertial odometry using the Isaac Elbrus GPU-accelerate

NVIDIA Isaac ROS 343 Jan 03, 2023