DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

A library for optimization on Riemannian manifolds

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Keras-1D-NN-Classifier

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

Learnable Motion Coherence for Correspondence Pruning

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

A unified framework for machine learning with time series

Meta Self-learning for Multi-Source Domain Adaptation： A Benchmark

Code for "Long-tailed Distribution Adaptation"

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

SatelliteSfM - A library for solving the satellite structure from motion problem

Asymmetric metric learning for knowledge transfer

Pca-on-genotypes - Mini bioinformatics project - PCA on genotypes

Mouse Brain in the Model Zoo

An Open-Source Tool for Automatic Disease Diagnosis..

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks