CoaT: Co-Scale Conv-Attentional Image Transformers

Last update: Dec 03, 2022

Related tags

Overview

CoaT: Co-Scale Conv-Attentional Image Transformers

Introduction

This repository contains the official code and pretrained models for CoaT: Co-Scale Conv-Attentional Image Transformers. It introduces (1) a co-scale mechanism to realize fine-to-coarse, coarse-to-fine and cross-scale attention modeling and (2) an efficient conv-attention module to realize relative position encoding in the factorized attention.

For more details, please refer to CoaT: Co-Scale Conv-Attentional Image Transformers by Weijian Xu*, Yifan Xu*, Tyler Chang, and Zhuowen Tu.

Changelog

04/23/2021: Pre-trained checkpoint for CoaT-Lite Mini is released.
04/22/2021: Code and pre-trained checkpoint for CoaT-Lite Tiny are released.

Usage

Environment Preparation

Set up a new conda environment and activate it.

# Create an environment with Python 3.8.
conda create -n coat python==3.8
conda activate coat

Install required packages.

# Install PyTorch 1.7.1 w/ CUDA 11.0.
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

# Install timm 0.3.2.
pip install timm==0.3.2

# Install einops.
pip install einops

Code and Dataset Preparation

Clone the repo.

git clone https://github.com/mlpc-ucsd/CoaT
cd CoaT

Download ImageNet dataset (ILSVRC 2012) and extract.

# Create dataset folder.
mkdir -p ./data/ImageNet

# Download the dataset (not shown here) and copy the files (assume the download path is in $DATASET_PATH).
cp $DATASET_PATH/ILSVRC2012_img_train.tar $DATASET_PATH/ILSVRC2012_img_val.tar $DATASET_PATH/ILSVRC2012_devkit_t12.tar.gz ./data/ImageNet

# Extract the dataset.
python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='train')"
python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='val')"
# After the extraction, you should observe `train` and `val` folders under ./data/ImageNet.

Evaluate Pre-trained Checkpoint

We provide the CoaT checkpoints pre-trained on the ImageNet dataset.

Name	[email protected]	[email protected]	#Params	SHA-256 (first 8 chars)	URL
CoaT-Lite Tiny	77.5	93.8	5.7M	e88e96b0	model, log
CoaT-Lite Mini	79.1	94.5	11M	6b4a8ae5	model, log

The following commands provide an example (CoaT-Lite Tiny) to evaluate the pre-trained checkpoint.

# Download the pretrained checkpoint.
mkdir -p ./output/pretrained
wget http://vcl.ucsd.edu/coat/pretrained/coat_lite_tiny_e88e96b0.pth -P ./output/pretrained
sha256sum ./output/pretrained/coat_lite_tiny_e88e96b0.pth  # Make sure it matches the SHA-256 hash (first 8 characters) in the table.

# Evaluate.
# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_pretrained ./output/pretrained/coat_lite_tiny_e88e96b0.pth
# It should output results similar to "[email protected] 77.504 [email protected] 93.814" at very last.

Train

The following commands provide an example (CoaT-Lite Tiny, 8-GPU) to train the CoaT model.

# Usage: bash ./scripts/train.sh [model name] [output folder]
bash ./scripts/train.sh coat_lite_tiny coat_lite_tiny

Evaluate

The following commands provide an example (CoaT-Lite Tiny) to evaluate the checkpoint after training.

# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_eval ./output/coat_lite_tiny/checkpoints/checkpoint0299.pth

Citation

@misc{xu2021coscale,
      title={Co-Scale Conv-Attentional Image Transformers}, 
      author={Weijian Xu and Yifan Xu and Tyler Chang and Zhuowen Tu},
      year={2021},
      eprint={2104.06399},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgment

Thanks to DeiT and pytorch-image-models for a clear and data-efficient implementation of ViT. Thanks to lucidrains' implementation of Lambda Networks and CPVT.

CoaT: Co-Scale Conv-Attentional Image Transformers

Related tags

Overview

CoaT: Co-Scale Conv-Attentional Image Transformers

Introduction

Changelog

Usage

Environment Preparation

Code and Dataset Preparation

Evaluate Pre-trained Checkpoint

Train

Evaluate

Citation

License

Acknowledgment

Owner

mlpc-ucsd

A repository for storing njxzc final exam review material

State-to-Distribution (STD) Model

RSNA Intracranial Hemorrhage Detection with python

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Multi-Output Gaussian Process Toolkit

TJU Deep Learning & Neural Network

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Unsupervised clustering of high content screen samples

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

A general, feasible, and extensible framework for classification tasks.

OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

A Python package for faster, safer, and simpler ML processes

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)