Post-Training Quantization for Vision transformers.

Related tags

Deep LearningPTQ4ViT
Overview

PTQ4ViT

Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on these activation values. And we use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. The quantized vision transformers (ViT, DeiT, and Swin) achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task. Please read the paper for details.

Install

Requirement

  • python>=3.5
  • pytorch>=1.5
  • matplotlib
  • pandas
  • timm

Datasets

To run example testing, you should put your ImageNet2012 dataset in path /datasets/imagenet.

We use ViTImageNetLoaderGenerator in utils/datasets.py to initialize our DataLoader. If your Imagenet datasets are stored elsewhere, you'll need to manually pass its root as an argument when instantiating a ViTImageNetLoaderGenerator.

Usage

1. Run example quantization

To test on all models with BasePTQ/PTQ4ViT, run

python example/test_all.py

To run ablation testing, run

python example/test_ablation.py

You can run the testing scripts with multiple GPUs. For example, calling

python example/test_all.py --multigpu --n_gpu 6

will use 6 gpus to run the test.

2. Download quantized model checkpoints

(Coming soon)

Results

Results of BasePTQ

model original w8a8 w6a6
ViT-S/224/32 75.99 73.61 60.144
ViT-S/224 81.39 80.468 70.244
ViT-B/224 84.54 83.896 75.668
ViT-B/384 86.00 85.352 46.886
DeiT-S/224 79.80 77.654 72.268
DeiT-B/224 81.80 80.946 78.786
DeiT-B/384 83.11 82.33 68.442
Swin-T/224 81.39 80.962 78.456
Swin-S/224 83.23 82.758 81.742
Swin-B/224 85.27 84.792 83.354
Swin-B/384 86.44 86.168 85.226

Results of PTQ4ViT

model original w8a8 w6a6
ViT-S/224/32 75.99 75.582 71.908
ViT-S/224 81.39 81.002 78.63
ViT-B/224 84.54 84.25 81.65
ViT-B/384 86.00 85.828 83.348
DeiT-S/224 79.80 79.474 76.282
DeiT-B/224 81.80 81.482 80.25
DeiT-B/384 83.11 82.974 81.55
Swin-T/224 81.39 81.246 80.47
Swin-S/224 83.23 83.106 82.38
Swin-B/224 85.27 85.146 84.012
Swin-B/384 86.44 86.394 85.388

Results of Ablation

  • ViT-S/224 (original top-1 accuracy 81.39%)
Hessian Guided Softmax Twin GELU Twin W8A8 W6A6
80.47 70.24
80.93 77.20
81.11 78.57
80.84 76.93
79.25 74.07
81.00 78.63
  • ViT-B/224 (original top-1 accuracy 84.54%)
Hessian Guided Softmax Twin GELU Twin W8A8 W6A6
83.90 75.67
83.97 79.90
84.07 80.76
84.10 80.82
83.40 78.86
84.25 81.65
  • ViT-B/384 (original top-1 accuracy 86.00%)
Hessian Guided Softmax Twin GELU Twin W8A8 W6A6
85.35 46.89
85.42 79.99
85.67 82.01
85.60 82.21
84.35 80.86
85.89 83.19

Citation

@article{PTQ4ViT_cvpr2022,
    title={PTQ4ViT: Post-Training Quantization Framework for Vision Transformers},
    author={Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun},
    journal={arXiv preprint arXiv:2111.12293},
    year={2022},
}
Owner
Zhihang Yuan
Zhihang Yuan
repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

repro_eval repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments. The measures were d

IR Group at Technische Hochschule Köln 9 May 25, 2022
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This repo contains the official implementation of the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.

Ehab AlBadawy 93 Jan 05, 2023
AAAI 2022: Stationary diffusion state neural estimation

Stationary Diffusion State Neural Estimation Although many graph-based clustering methods attempt to model the stationary diffusion state in their obj

绽琨 33 Nov 24, 2022
Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment

Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment.

MT Schmitz 2 Feb 11, 2022
Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

LANKA This is the source code for paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases (ACL 2021, long paper) Referen

Boxi Cao 30 Oct 24, 2022
An all-in-one application to visualize multiple different local path planning algorithms

Table of Contents Table of Contents Local Planner Visualization Project (LPVP) Features Installation/Usage Local Planners Probabilistic Roadmap (PRM)

Abdur Javaid 47 Dec 30, 2022
Official implementation of Pixel-Level Bijective Matching for Video Object Segmentation

BMVOS This is the official implementation of Pixel-Level Bijective Matching for Video Object Segmentation, to appear in WACV 2022. @article{cho2021pix

Suhwan Cho 13 Dec 14, 2022
A PyTorch based deep learning library for drug pair scoring.

Documentation | External Resources | Datasets | Examples ChemicalX is a deep learning library for drug-drug interaction, polypharmacy side effect and

AstraZeneca 597 Dec 30, 2022
Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Proxy Anchor Loss for Deep Metric Learning Unofficial pytorch, tensorflow and mxnet implementations of Proxy Anchor Loss for Deep Metric Learning. Not

Geonmo Gu 3 Jun 09, 2021
Hand gesture recognition model that can be used as a remote control for a smart tv.

Gesture_recognition The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds lon

Pratyush Negi 1 Aug 11, 2022
Keras code and weights files for popular deep learning models.

Trained image classification models for Keras THIS REPOSITORY IS DEPRECATED. USE THE MODULE keras.applications INSTEAD. Pull requests will not be revi

François Chollet 7.2k Dec 29, 2022
.NET bindings for the Pytorch engine

TorchSharp TorchSharp is a .NET library that provides access to the library that powers PyTorch. It is a work in progress, but already provides a .NET

Matteo Interlandi 17 Aug 30, 2021
[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

Junha Lee 10 Dec 02, 2022
A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

HOW TO USE THIS PROJECT A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets Based on DeepLabCut toolbox, we run wit

1 Jan 10, 2022
A more easy-to-use implementation of KPConv

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Zheng Qin 35 Dec 14, 2022
NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

NAS-FCOS: Fast Neural Architecture Search for Object Detection This project hosts the train and inference code with pretrained model for implementing

Ning Wang 180 Dec 06, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

BI-RADS BERT Implementation of BI-RADS-BERT & The Advantages of Section Tokenization. This implementation could be used on other radiology in house co

1 May 17, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

KML: A Machine Learning Framework for Operating Systems & Storage Systems Storage systems and their OS components are designed to accommodate a wide v

File systems and Storage Lab (FSL) 186 Nov 24, 2022