Conformer: Local Features Coupling Global Representations for Visual Recognition (arxiv)

This repository is built upon DeiT and timm

Usage

First, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Training

To train Conformer-S on ImageNet on a single node with 8 gpus for 300 epochs run:

Conformer-S

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
OUTPUT='./output/Conformer_small_patch16_batch_1024_lr1e-3_300epochs'

python -m torch.distributed.launch --master_port 50130 --nproc_per_node=8 --use_env main.py \
                                   --model Conformer_small_patch16 \
                                   --data-set IMNET \
                                   --batch-size 128 \
                                   --lr 0.001 \
                                   --num_workers 4 \
                                   --data-path /data/user/Dataset/ImageNet_ILSVRC2012/ \
                                   --output_dir ${OUTPUT} \
                                   --epochs 300

Model Zoo

Model	Parameters	MACs	Top-1 Acc	Link
Conformer-Ti	23.5 M	5.2 G	81.3 %	baidu(code: hzhm) google
Conformer-S	37.7 M	10.6 G	83.4 %	baidu(code: qvu8) google
Conformer-B	83.3 M	23.3 G	84.1 %	baidu(code: b4z9) google

Citation

@article{peng2021conformer,
      title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
      author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
      journal={arXiv preprint arXiv:2105.03889},
      year={2021},
}

Conformer: Local Features Coupling Global Representations for Visual Recognition

Related tags

Overview

Conformer: Local Features Coupling Global Representations for Visual Recognition (arxiv)

Usage

Data preparation

Training

Model Zoo

Citation

Owner

Zhiliang Peng

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Reinforcement Learning via Supervised Learning

Repository for Multimodal AutoML Benchmark

Have you ever wondered how cool it would be to have your own A.I

Large scale PTM - PPI relation extraction

Neural Cellular Automata + CLIP

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

LSTMs (Long Short Term Memory) RNN for prediction of price trends

Semantic Segmentation Suite in TensorFlow

Code for the paper Task Agnostic Morphology Evolution.

Code for our ALiBi method for transformer language models.

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

Post-training Quantization for Neural Networks with Provable Guarantees

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.