PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Last update: Dec 16, 2022

Related tags

Deep Learning R2Plus1D-PyTorch

Overview

R2Plus1D-PyTorch

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Link to original: paper and code

NOTE: This repository has been archived, although forks and other work that extend on top of this remain welcome

Requirements

R2Plus1D-PyTorch has the following requirements

PyTorch 0.4 and dependencies
OpenCV (tested on 3.4.0.12)
tqdm (for progress bars)

About this repository

This repository consists of four python files:

module.py - Contains an implementation of the factored, R2Plus1D convolution the entire implementation is based around. It is designed to be a replacement for nn.Conv3D in the appropriate scenario
network.py - Uses module.py to build up the residual network described in the paper
dataset.py - Implements a PyTorch dataset, that can load videos with appropriate labels from a given directory.
trainer.py - A mildly modified version of the script from the PyTorch tutorials to train the model. Features saving and restoring capabilities.

Training on Kinetics-400/600

This repository does not include a crawler or downloader for the Kinetics-400/600 dataset, however, one can be found here. It is strongly recommended to downsample the videos prior to training (and not on the fly), using a tool such as ffmpeg. If using the crawler, this can be done by adding "-vf", "scale=172:128" to the ffmpeg command list in the download clip function.

Training in general

This repository is designed for the ResNet to be trained on any dataset of videos in general, using the VideoDataloader class from dataset.py . It expects the videos to be arranged in a directory -> [train/val] folders -> [class_label] folders (one for each class) -> videos (the files themselves).

Forks and fixes of this repo are highly welcome!

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Related tags

Overview

R2Plus1D-PyTorch

Requirements

About this repository

Training on Kinetics-400/600

Training in general

Owner

Irhum Shafkat

This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

DualGAN-tensorflow: tensorflow implementation of DualGAN

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

This repository is to support contributions for tools for the Project CodeNet dataset hosted in DAX

A strongly-typed genetic programming framework for Python

Dual Attention Network for Scene Segmentation (CVPR2019)

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

Deep Reinforcement Learning based autonomous navigation for quadcopters using PPO algorithm.

This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors