UniFormer - official implementation of UniFormer

Overview

UniFormer

This repo is the official implementation of "Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning". It currently includes code and models for the following tasks:

Updates

01/13/2022

[Initial commits]:

  1. Pretrained models on ImageNet-1K, Kinetics-400, Kinetics-600, Something-Something V1&V2

  2. The supported code and models for image classification and video classification are provided.

Introduction

UniFormer (Unified transFormer) is introduce in arxiv, which effectively unifies 3D convolution and spatiotemporal self-attention in a concise transformer format. We adopt local MHRA in shallow layers to largely reduce computation burden and global MHRA in deep layers to learn global token relation.

UniFormer achieves strong performance on video classification. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10x fewer GFLOPs than other comparable methods (e.g., 16.7x fewer GFLOPs than ViViT with JFT-300M pre-training). For Something-Something V1 and V2, our UniFormer achieves 60.9% and 71.2% top-1 accuracy respectively, which are new state-of-the-art performances.

teaser

Main results on ImageNet-1K

Please see image_classification for more details.

More models with large resolution and token labeling will be released soon.

Model Pretrain Resolution Top-1 #Param. FLOPs
UniFormer-S ImageNet-1K 224x224 82.9 22M 3.6G
UniFormer-S† ImageNet-1K 224x224 83.4 24M 4.2G
UniFormer-B ImageNet-1K 224x224 83.9 50M 8.3G

Main results on Kinetics-400

Please see video_classification for more details.

Model Pretrain #Frame Sampling Method FLOPs K400 Top-1 K600 Top-1
UniFormer-S ImageNet-1K 16x1x4 16x4 167G 80.8 82.8
UniFormer-S ImageNet-1K 16x1x4 16x8 167G 80.8 82.7
UniFormer-S ImageNet-1K 32x1x4 32x4 438G 82.0 -
UniFormer-B ImageNet-1K 16x1x4 16x4 387G 82.0 84.0
UniFormer-B ImageNet-1K 16x1x4 16x8 387G 81.7 83.4
UniFormer-B ImageNet-1K 32x1x4 32x4 1036G 82.9 84.5*

* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.

Main results on Something-Something

Please see video_classification for more details.

Model Pretrain #Frame FLOPs SSV1 Top-1 SSV2 Top-1
UniFormer-S K400 16x3x1 125G 57.2 67.7
UniFormer-S K600 16x3x1 125G 57.6 69.4
UniFormer-S K400 32x3x1 329G 58.8 69.0
UniFormer-S K600 32x3x1 329G 59.9 70.4
UniFormer-B K400 16x3x1 290G 59.1 70.4
UniFormer-B K600 16x3x1 290G 58.8 70.2
UniFormer-B K400 32x3x1 777G 60.9 71.1
UniFormer-B K600 32x3x1 777G 61.0 71.2

Main results on downstream tasks

We have conducted extensive experiments on downstream tasks and achieved comparable results with SOTA models.

Code and models will be released in two weeks.

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformer,
      title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, 
      author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
      year={2022},
      eprint={2201.04676},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Contributors and Contact Information

UniFormer is maintained by Kunchang Li.

For help or issues using UniFormer, please submit a GitHub issue.

For other communications related to UniFormer, please contact Kunchang Li ([email protected]).

Owner
SenseTime X-Lab
Powered by X-Lab, SenseTime Research
SenseTime X-Lab
Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❀️

Hello 🀟 #Team-Greider The team of 20 people for HackBio'2021 Virtual Bioinformatics Internship πŸ’ πŸ–¨οΈ πŸ‘¨β€πŸ’» HackBio: https://thehackbio.com πŸ’¬ Ask us

Siddhant Sharma 7 Oct 20, 2022
This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

Towards Persona-Based Empathetic Conversational Models (PEC) This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (E

Zhong Peixiang 35 Nov 17, 2022
Hierarchical Attentive Recurrent Tracking

Hierarchical Attentive Recurrent Tracking This is an official Tensorflow implementation of single object tracking in videos by using hierarchical atte

Adam Kosiorek 147 Aug 07, 2021
ruptures: change point detection in Python

Welcome to ruptures ruptures is a Python library for off-line change point detection. This package provides methods for the analysis and segmentation

Charles T. 1.1k Jan 03, 2023
Repository sharing code and the model for the paper "Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes"

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes Setup virtualenv -p python3 venv source venv/bin/activate pip instal

Planet AI GmbH 9 May 20, 2022
Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

DeepMind 29 Dec 29, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork πŸ‘€ : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Neural Contours: Learning to Draw Lines from 3D Shapes (CVPR2020)

Neural Contours: Learning to Draw Lines from 3D Shapes This repository contains the PyTorch implementation for CVPR 2020 Paper "Neural Contours: Learn

93 Dec 16, 2022
TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

TensorFlow-Image-Models Introduction Usage Models Profiling License Introduction TensorfFlow-Image-Models (tfimm) is a collection of image models with

Martins Bruveris 227 Dec 20, 2022
This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

GANs for Fun Created because I can! GOAL The goal of this repo is to be freely used by ML devs to check the GAN performances without coding from scrat

Sagnik Roy 13 Jan 26, 2022
Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

DHAP Source code of SIGIR2021 Long Paper: One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles . Preinstallation Fir

ZYMa 32 Dec 06, 2022
Repository for the NeurIPS 2021 paper: "Exploiting Domain-Specific Features to Enhance Domain Generalization".

meta-Domain Specific-Domain Invariant (mDSDI) Source code implementation for the paper: Manh-Ha Bui, Toan Tran, Anh Tuan Tran, Dinh Phung. "Exploiting

VinAI Research 12 Nov 25, 2022
A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

This repository contains the similarity metrics designed and evaluated in the paper, and instructions and code to re-run the experiments. Implementation in the deep-learning framework PyTorch

Steffen 86 Dec 27, 2022
OverFeat is a Convolutional Network-based image classifier and feature extractor.

OverFeat OverFeat is a Convolutional Network-based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participat

593 Dec 08, 2022
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
A medical imaging framework for Pytorch

Welcome to MedicalTorch MedicalTorch is an open-source framework for PyTorch, implementing an extensive set of loaders, pre-processors and datasets fo

Christian S. Perone 799 Jan 03, 2023
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
Voxel Transformer for 3D object detection

Voxel Transformer This is a reproduced repo of Voxel Transformer for 3D object detection. The code is mainly based on OpenPCDet. Introduction We provi

173 Dec 25, 2022