MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

Overview

MoViNet-pytorch

Open In Colab Paper

Pytorch unofficial implementation of MoViNets: Mobile Video Networks for Efficient Video Recognition.
Authors: Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong (Google Research)
[Authors' Implementation]

Stream Buffer

stream buffer

Clean stream buffer

It is required to clean the buffer after all the clips of the same video have been processed.

model.clean_activation_buffers()

Usage

Open In Colab
Click on "Open in Colab" to open an example of training on HMDB-51

installation

pip install git+https://github.com/Atze00/MoViNet-pytorch.git

How to build a model

Use causal = True to use the model with stream buffer, causal = False will use standard convolutions

from movinets import MoViNet
from movinets.config import _C

MoViNetA0 = MoViNet(_C.MODEL.MoViNetA0, causal = True, pretrained = True )
MoViNetA1 = MoViNet(_C.MODEL.MoViNetA1, causal = True, pretrained = True )
...
Load weights

Use pretrained = True to use the model with pretrained weights

    """
    If pretrained is True:
        num_classes is set to 600,
        conv_type is set to "3d" if causal is False, "2plus1d" if causal is True
        tf_like is set to True
    """
model = MoViNet(_C.MODEL.MoViNetA0, causal = True, pretrained = True )
model = MoViNet(_C.MODEL.MoViNetA0, causal = False, pretrained = True )

Training loop examples

Training loop with stream buffer

def train_iter(model, optimz, data_load, n_clips = 5, n_clip_frames=8):
    """
    In causal mode with stream buffer a single video is fed to the network
    using subclips of lenght n_clip_frames. 
    n_clips*n_clip_frames should be equal to the total number of frames presents
    in the video.
    
    n_clips : number of clips that are used
    n_clip_frames : number of frame contained in each clip
    """
    
    #clean the buffer of activations
    model.clean_activation_buffers()
    optimz.zero_grad()
    for i, data, target in enumerate(data_load):
        #backward pass for each clip
        for j in range(n_clips):
          out = F.log_softmax(model(data[:,:,(n_clip_frames)*(j):(n_clip_frames)*(j+1)]), dim=1)
          loss = F.nll_loss(out, target)/n_clips
          loss.backward()
        optimz.step()
        optimz.zero_grad()
        
        #clean the buffer of activations
        model.clean_activation_buffers()

Training loop with standard convolutions

def train_iter(model, optimz, data_load):

    optimz.zero_grad()
    for i, (data,_ , target) in enumerate(data_load):
        out = F.log_softmax(model(data), dim=1)
        loss = F.nll_loss(out, target)
        loss.backward()
        optimz.step()
        optimz.zero_grad()

Pretrained models

Weights

The weights are loaded from the tensorflow models released by the authors, trained on kinetics.

Base Models

Base models implement standard 3D convolutions without stream buffers.

Model Name Top-1 Accuracy* Top-5 Accuracy* Input Shape
MoViNet-A0-Base 72.28 90.92 50 x 172 x 172
MoViNet-A1-Base 76.69 93.40 50 x 172 x 172
MoViNet-A2-Base 78.62 94.17 50 x 224 x 224
MoViNet-A3-Base 81.79 95.67 120 x 256 x 256
MoViNet-A4-Base 83.48 96.16 80 x 290 x 290
MoViNet-A5-Base 84.27 96.39 120 x 320 x 320
Model Name Top-1 Accuracy* Top-5 Accuracy* Input Shape**
MoViNet-A0-Stream 72.05 90.63 50 x 172 x 172
MoViNet-A1-Stream 76.45 93.25 50 x 172 x 172
MoViNet-A2-Stream 78.40 94.05 50 x 224 x 224

**In streaming mode, the number of frames correspond to the total accumulated duration of the 10-second clip.

*Accuracy reported on the official repository for the dataset kinetics 600, It has not been tested by me. It should be the same since the tf models and the reimplemented pytorch models output the same results [Test].

I currently haven't tested the speed of the streaming models, feel free to test and contribute.

Status

Currently are available the pretrained models for the following architectures:

  • MoViNetA1-BASE
  • MoViNetA1-STREAM
  • MoViNetA2-BASE
  • MoViNetA2-STREAM
  • MoViNetA3-BASE
  • MoViNetA3-STREAM
  • MoViNetA4-BASE
  • MoViNetA4-STREAM
  • MoViNetA5-BASE
  • MoViNetA5-STREAM

I currently have no plans to include streaming version of A3,A4,A5. Those models are too slow for most mobile applications.

Testing

I recommend to create a new environment for testing and run the following command to install all the required packages:
pip install -r tests/test_requirements.txt

Citations

@article{kondratyuk2021movinets,
  title={MoViNets: Mobile Video Networks for Efficient Video Recognition},
  author={Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Matthew Brown, and Boqing Gong},
  journal={arXiv preprint arXiv:2103.11511},
  year={2021}
}
Data Preparation, Processing, and Visualization for MoVi Data

MoVi-Toolbox Data Preparation, Processing, and Visualization for MoVi Data, https://www.biomotionlab.ca/movi/ MoVi is a large multipurpose dataset of

Saeed Ghorbani 51 Nov 27, 2022
MATLAB codes of the book "Digital Image Processing Fourth Edition" converted to Python

Digital Image Processing Python MATLAB codes of the book "Digital Image Processing Fourth Edition" converted to Python TO-DO: Refactor scripts, curren

Merve Noyan 24 Oct 16, 2022
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022
A Real-ESRGAN equipped Colab notebook for CLIP Guided Diffusion

#360Diffusion automatically upscales your CLIP Guided Diffusion outputs using Real-ESRGAN. Latest Update: Alpha 1.61 [Main Branch] - 01/11/22 Layout a

78 Nov 02, 2022
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
Numerical Methods with Python, Numpy and Matplotlib

Numerical Bric-a-Brac Collections of numerical techniques with Python and standard computational packages (Numpy, SciPy, Numba, Matplotlib ...). Diffe

Vincent Bonnet 10 Dec 20, 2021
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

130 Dec 25, 2022
Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021]

Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021] This repository is the official implementation of Moiré Attack (MA): A New Pot

Dantong Niu 22 Dec 24, 2022
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

Matthew Macy 606 Dec 21, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

51 Dec 11, 2022
✨风纪委员会自动投票脚本,利用Github Action帮你进行裁决操作(为了让其他风纪委员有案件可判,本程序从中午12点才开始运行,有需要请自己修改运行时间)

风纪委员会自动投票 本脚本通过使用Github Action来实现B站风纪委员的自动投票功能,喜欢请给我点个STAR吧! 如果你不是风纪委员,在符合风纪委员申请条件的情况下,本脚本会自动帮你申请 投票时间是早上八点,如果有需要请自行修改.github/workflows/Judge.yml中的时间,

Pesy Wu 25 Feb 17, 2021
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Deep Face Recognition in PyTorch

Face Recognition in PyTorch By Alexey Gruzdev and Vladislav Sovrasov Introduction A repository for different experimental Face Recognition models such

Alexey Gruzdev 141 Sep 11, 2022
This is a demo app to be used in the video streaming applications

MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev

ATHENA Christian Doppler (CD) Laboratory 7 Jul 21, 2022
Auto White-Balance Correction for Mixed-Illuminant Scenes

Auto White-Balance Correction for Mixed-Illuminant Scenes Mahmoud Afifi, Marcus A. Brubaker, and Michael S. Brown York University Video Reference code

Mahmoud Afifi 47 Nov 26, 2022
NeurIPS 2021 Datasets and Benchmarks Track

AP-10K: A Benchmark for Animal Pose Estimation in the Wild Introduction | Updates | Overview | Download | Training Code | Key Questions | License Intr

AP-10K 82 Dec 11, 2022