This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Last update: Dec 13, 2022

Related tags

Overview

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2, coming soon.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	1K model
ResT-Lite	224x224	77.2	93.7	10.5M	1.4G	1246	baidu
ResT-Small	224x224	79.6	94.9	13.7M	1.9G	1043	baidu
ResT-Base	224x224	81.6	95.7	30.3M	4.3G	673	baidu
ResT-Large	224x224	83.6	96.3	51.6M	7.9G	429	baidu

Note: access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v2},
  year={2021}
}

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Related tags

Overview

ResT

Main Results on ImageNet with Pretrained Models

Citing ResT

Owner

zhql

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

A deep learning framework for historical document image analysis

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

source code the paper Fast and Robust Iterative Closet Point.

This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

StyleGAN2-ADA - Official PyTorch implementation

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

The Python3 import playground

HGCN: Harmonic Gated Compensation Network For Speech Enhancement

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

Underwater image enhancement

pytorch implementation of ABC : Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Differentiable Surface Triangulation

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)