MlTr: Multi-label Classification with Transformer

This is official implement of "MlTr: Multi-label Classification with Transformer".

Abstract

The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5%, 95.8%, and 65.5% respectively.

Pretrained model (Results on MS-COCO2014)

name	resolution	map	params(M)	model	log
mltr-s	224x224	81.9	33	coming soon	coming soon
mltr-m	384x384	86.8	62	coming soon	coming soon
mltr-l	384x384	88.5	108	coming soon	coming soon

Citing artical

Pleadse cite this article as:

@misc{cheng2021mltr,
      title={MlTr: Multi-label Classification with Transformer}, 
      author={Xing Cheng and Hezheng Lin and Xiangyu Wu and Fan Yang and Dong Shen and Zhongyuan Wang and Nian Shi and Honglin Liu},
      year={2021},
      eprint={2106.06195},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Started

Please refer to get_started.

MlTr: Multi-label Classification with Transformer

Related tags

Overview

MlTr: Multi-label Classification with Transformer

Abstract

Pretrained model (Results on MS-COCO2014)

Citing artical

Started

Owner

程星

Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

Code for NeurIPS 2021 paper 'Spatio-Temporal Variational Gaussian Processes'

Code for "Learning to Regrasp by Learning to Place"

Omnidirectional camera calibration in python

A python library for time-series smoothing and outlier detection in a vectorized way.

PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Active Offline Policy Selection With Python

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

U-Net Brain Tumor Segmentation

Datasets, tools, and benchmarks for representation learning of code.

An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

TensorFlow ROCm port

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

Blind Video Temporal Consistency via Deep Video Prior

TransVTSpotter: End-to-end Video Text Spotter with Transformer

QilingLab challenge writeup