This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Last update: Dec 18, 2022

Overview

Dynamic-Vision-Transformer (Pytorch)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length

Update on 2021/06/01: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically.

Results

Top-1 accuracy on ImageNet v.s. GFLOPs

Top-1 accuracy on CIFAR v.s. GFLOPs

Top-1 accuracy on ImageNet v.s. Throughput

Visualization

Pre-trained Models

Backbone	# of Exits	# of Tokens	Links
T2T-ViT-12	3	7x7-10x10-14x14	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_state_dict: state dictionaries of the model
├── flops: a list containing the GFLOPs corresponding to exiting at each exit
├── anytime_classification: Top-1 accuracy of each exit
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of T2T-ViT from here.

To Do

Update the code for training.

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Related tags

Overview

Dynamic-Vision-Transformer (Pytorch)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Contact

Acknowledgment

To Do

Owner

FaceAPI: AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using TensorFlow/JS

这是一个yolox-pytorch的源码，可以用于训练自己的模型。

Model Zoo for AI Model Efficiency Toolkit

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Lightweight library to build and train neural networks in Theano

Deeplab-resnet-101 in Pytorch with Jaccard loss

This is the face keypoint train code of project face-detection-project

Semantic graph parser based on Categorial grammars

DeLiGAN - This project is an implementation of the Generative Adversarial Network

DSL for matching Python ASTs

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

PushForKiCad - AISLER Push for KiCad EDA

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

COVID-Net Open Source Initiative

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)

Personalized Federated Learning using Pytorch (pFedMe)

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network