As-ViT: Auto-scaling Vision Transformers without Training

Overview

As-ViT: Auto-scaling Vision Transformers without Training [PDF]

MIT licensed

Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

In ICLR 2022.

Note: We implemented topology search (sec. 3.3) and scaling (sec. 3.4) in this code base in PyTorch. Our training code is based on Tensorflow and Keras on TPU, which will be released soon.

Overview

We present As-ViT, a framework that unifies the automatic architecture design and scaling for ViT (vision transformer), in a training-free strategy.

Highlights:

  • Trainig-free ViT Architecture Design: we design a "seed" ViT topology by leveraging a training-free search process. This extremely fast search is fulfilled by our comprehensive study of ViT's network complexity (length distorsion), yielding a strong Kendall-tau correlation with ground-truth accuracies.
  • Trainig-free ViT Architecture Scaling: starting from the "seed" topology, we automate the scaling rule for ViTs by growing widths/depths to different ViT layers. This will generate a series of architectures with different numbers of parameters in a single run.
  • Efficient ViT Training via Progressive Tokenization: we observe that ViTs can tolerate coarse tokenization in early training stages, and further propose to train ViTs faster and cheaper with a progressive tokenization strategy.

teaser
Left: Length Distortion shows a strong correlation with ViT's accuracy. Middle: Auto scaling rule of As-ViT. Right: Progressive re-tokenization for efficient ViT training.

Prerequisites

  • Ubuntu 18.04
  • Python 3.6.9
  • CUDA 11.0 (lower versions may work but were not tested)
  • NVIDIA GPU + CuDNN v7.6

This repository has been tested on V100 GPU. Configurations may need to be changed on different platforms.

Installation

  • Clone this repo:
git clone https://github.com/VITA-Grou/AsViT.git
cd AsViT
  • Install dependencies:
pip install -r requirements.txt

1. Seed As-ViT Topology Search

CUDA_VISIBLE_DEVICES=0 python ./search/reinforce.py --save_dir ./output/REINFORCE-imagenet --data_path /path/to/imagenet

This job will return you a seed topology. For example, our search seed topology is 8,2,3|4,1,2|4,1,4|4,1,6|32, which can be explained as below:

Stage1 Stage2 Stage3 Stage4 Head
Kernel K1 Split S1 Expansion E1 Kernel K2 Split S2 Expansion E2 Kernel K3 Split S3 Expansion E3 Kernel K4 Split S4 Expansion E4
8 2 3 4 1 2 4 1 4 4 1 6 32

2. Scaling

CUDA_VISIBLE_DEVICES=0 python ./search/grow.py --save_dir ./output/GROW-imagenet \
--arch "[arch]" --data_path /path/to/imagenet

Here [arch] is the seed topology (output from step 1 above). This job will return you a series of topologies. For example, our largest topology (As-ViT Large) is 8,2,3,5|4,1,2,2|4,1,4,5|4,1,6,2|32,180, which can be explained as below:

Stage1 Stage2 Stage3 Stage4 Head Initial Hidden Size
Kernel K1 Split S1 Expansion E1 Layers L1 Kernel K2 Split S2 Expansion E2 Layers L2 Kernel K3 Split S3 Expansion E3 Layers L3 Kernel K4 Split S4 Expansion E4 Layers L4
8 2 3 5 4 1 2 2 4 1 4 5 4 1 6 2 32 180

3. Evaluation

Tensorflow and Keras code for training on TPU. To be released soon.

Citation

@inproceedings{chen2021asvit,
  title={Auto-scaling Vision Transformers without Training},
  author={Chen, Wuyang and Huang, Wei and Du, Xianzhi and Song, Xiaodan and Wang, Zhangyang and Zhou, Denny},
  booktitle={International Conference on Learning Representations},
  year={2022}
}
Owner
VITA
Visual Informatics Group @ University of Texas at Austin
VITA
Neural Ensemble Search for Performant and Calibrated Predictions

Neural Ensemble Search Introduction This repo contains the code accompanying the paper: Neural Ensemble Search for Performant and Calibrated Predictio

AutoML-Freiburg-Hannover 26 Dec 12, 2022
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

Nazim Shaikh 64 Nov 02, 2022
Official git repo for the CHIRP project

CHIRP Project This is the official git repository for the CHIRP project. Pull requests are accepted here, but for the moment, the main repository is s

Dan Smith 77 Jan 08, 2023
QuadTree Attention for Vision Transformers (ICLR2022)

This repository contains codes for quadtree attention. This repo contains codes for feature matching, image classficiation, object detection and seman

tangshitao 222 Dec 28, 2022
Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Association Rules Mining Using Python Implementation of association rules mining algorithms (Apriori|FPGrowth) using python. As a part of hw1 code in

Pre 2 Nov 10, 2021
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
This porject is intented to build the most accurate model for predicting the porbability of loan default

Estimating-Loan-Default-Probability IBA ML2 Mid-project / Kaggle Competition This porject is intented to build the most accurate model for predicting

Adil Gahramanov 1 Jan 24, 2022
Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning Reference Abeßer, J. & Müller, M. Towards Audio Domain Adapt

Jakob Abeßer 2 Jul 06, 2022
Accurate identification of bacteriophages from metagenomic data using Transformer

PhaMer is a python library for identifying bacteriophages from metagenomic data. PhaMer is based on a Transorfer model and rely on protein-based vocab

Kenneth Shang 9 Nov 30, 2022
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Ranger-Deep-Learning-Optimizer Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) i

Less Wright 1.1k Dec 21, 2022
Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

NVIDIA AI IOT 803 Jan 06, 2023
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Dec 28, 2022
Unsupervised Representation Learning via Neural Activation Coding

Neural Activation Coding This repository contains the code for the paper "Unsupervised Representation Learning via Neural Activation Coding" published

yookoon park 5 May 26, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Hieu Duong 7 Jan 12, 2022
This repository contains code from the paper "TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network"

TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network This repository contains code from the paper "TTS-GAN: A Transformer-based Tim

Intelligent Multimodal Computing and Sensing Laboratory (IMICS Lab) - Texas State University 108 Dec 29, 2022