AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Last update: Dec 26, 2022

Related tags

Deep Learning AdaFocusV2

Overview

AdaFocusV2

This repo contains the official code and pre-trained models for AdaFocusV2.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Introduction

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between accuracy and inference speed by dynamically identifying and attending to the informative regions in each video frame. However, AdaFocus requires a complicated three-stage training pipeline (involving reinforcement learning), leading to slow convergence and is unfriendly to practitioners. This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. We further present an improved training scheme to address the issues introduced by the one-stage formulation, including the lack of supervision, input diversity and training stability. Moreover, a conditional-exit technique is proposed to perform temporal adaptive computation on top of AdaFocus without additional training. Extensive experiments on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, and Jester) demonstrate that our model significantly outperforms the original AdaFocus and other competitive baselines, while being considerably more simple and efficient to train.

Results

Compared with AdaFocusV1

ActivityNet, FCVID and Mini-Kinetics

Something-Something V1&V2 and Jester

Visualization

Get Started

Please go to the folder Experiments on ActivityNet, FCVID and Mini-Kinetics and Experiments on Sth-Sth and Jester for specific docs.

Contact

If you have any question, feel free to contact the authors or raise an issue. Yulin Wang: [email protected].

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Related tags

Overview

AdaFocusV2

Introduction

Results

Get Started

Contact

Owner

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

DeconvNet : Learning Deconvolution Network for Semantic Segmentation

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment

An implementation of chunked, compressed, N-dimensional arrays for Python.

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

Data cleaning, missing value handle, EDA use in this project

Fully Connected DenseNet for Image Segmentation

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

Tensorflow Tutorials using Jupyter Notebook

Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Capsule endoscopy detection DACON challenge

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Deep Distributed Control of Port-Hamiltonian Systems

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

PyTorch implementations of Generative Adversarial Networks.

Code and data for ImageCoDe, a contextual vison-and-language benchmark