PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

COIN the currently largest dataset for comprehensive instruction video analysis.

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

Voice of Pajlada with model and weights.

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

Greedy Gaussian Segmentation

CIFAR-10 Photo Classification

pq is a jq-like Pickle file viewer

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

YoHa - A practical hand tracking engine.

Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021

PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

performing moving objects segmentation using image processing techniques with opencv and numpy

Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction