LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Last update: Jan 02, 2023

Related tags

Overview

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.

They obtain competitive tradeoffs in terms of speed / precision:

For details see LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.

If you use this code for a paper please cite:

@article{graham2021levit,
  title={LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference},
  author={Benjamin Graham and Alaaeldin El-Nouby and Hugo Touvron and Pierre Stock and Armand Joulin and Herv\'e J\'egou and Matthijs Douze},
  journal={arXiv preprint arXiv:22104.01136},
  year={2021}
}

Model Zoo

We provide baseline LeViT models trained with distllation on ImageNet 2012.

name	[email protected]	[email protected]	#FLOPs	#params	url
LeViT-128S	76.6	92.9	305M	7.8M	model
LeViT-128	78.6	94.0	406M	9.2M	model
LeViT-192	80.0	94.7	658M	11M	model
LeViT-256	81.6	95.4	1120M	19M	model
LeViT-384	82.6	96.0	2353M	39M	model

Usage

First, clone the repository locally:

git clone https://github.com/facebookresearch/levit.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models:

conda install -c pytorch pytorch torchvision
pip install timm

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate a pre-trained LeViT-256 model on ImageNet val with a single GPU run:

python main.py --eval --model LeViT_256 --data-path /path/to/imagenet

This should give

* [email protected] 81.636 [email protected] 95.424 loss 0.750

Training

To train LeViT-256 on ImageNet with hard distillation on a single node with 8 gpus run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model LeViT_256 --data-path /path/to/imagenet --output_dir /path/to/save

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

To train LeViT-256 model on ImageNet on one node with 8 gpus:

python run_with_submitit.py --model LeViT_256 --data-path /path/to/imagenet

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Related tags

Overview

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Model Zoo

Usage

Data preparation

Evaluation

Training

Multinode training

License

Contributing

Owner

Facebook Research

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

ByteTrack超详细教程！训练自己的数据集&&摄像头实时检测跟踪

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

CLIP+FFT text-to-image

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Code for Paper: Self-supervised Learning of Motion Capture

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

Pun Detection and Location

Code for CVPR 2021 paper TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search.

Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Dense Gaussian Processes for Few-Shot Segmentation

A PyTorch implementation for Unsupervised Domain Adaptation by Backpropagation(DANN), support Office-31 and Office-Home dataset

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition