We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Last update: Nov 08, 2022

Related tags

Overview

ConTNet

Introduction

ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large receptive field, limiting the performance of ConvNets on downstream tasks. (2) Transformer-based model is not robust enough and requires special training settings or hundreds of millions of images as the pretrain dataset, thereby limiting their adoption. ConTNet combines convolution and transformer alternately, which is very robust and can be optimized like ResNet unlike the recently-proposed transformer-based models (e.g., ViT, DeiT) that are sensitive to hyper-parameters and need many tricks when trained from scratch on a midsize dataset (e.g., ImageNet).

Main Results on ImageNet

name	resolution	[email protected]	#params(M)	FLOPs(G)
Res-18	224x224	71.5	11.7	1.8
ConT-S	224x224	74.9	10.1	1.5
Res-50	224x224	77.1	25.6	4.0
ConT-M	224x224	77.6	19.2	3.1
Res-101	224x224	78.2	44.5	7.6
ConT-B	224x224	77.9	39.6	6.4
DeiT-Ti^*	224x224	72.2	5.7	1.3
ConT-Ti^*	224x224	74.9	5.8	0.8
Res-18^*	224x224	73.2	11.7	1.8
ConT-S^*	224x224	76.5	10.1	1.5
Res-50^*	224x224	78.6	25.6	4.0
DeiT-S^*	224x224	79.8	22.1	4.6
ConT-M^*	224x224	80.2	19.2	3.1
Res-101^*	224x224	80.0	44.5	7.6
DeiT-B^*	224x224	81.8	86.6	17.6
ConT-B^*	224x224	81.8	39.6	6.4

Note: ^* indicates training with strong augmentations.

Main Results on Downstream Tasks

Object detection results on COCO.

method	backbone	#params(M)	FLOPs(G)	AP	APs	APm	APl
RetinaNet	Res-50 ConTNet-M	32.0 27.0	235.6 217.2	36.5 37.9	20.4 23.0	40.3 40.6	48.1 50.4
FCOS	Res-50 ConTNet-M	32.2 27.2	242.9 228.4	38.7 40.8	22.9 25.1	42.5 44.6	50.1 53.0
faster rcnn	Res-50 ConTNet-M	41.5 36.6	241.0 225.6	37.4 40.0	21.2 25.4	41.0 43.0	48.1 52.0

Instance segmentation results on Cityscapes based on Mask-RCNN.

backbone	AP^bb	AP_s^bb	AP_m^bb	AP_l^bb	AP^mk	AP_s^mk	AP_m^mk	AP_l^mk
Res-50 ConT-M	38.2 40.5	21.9 25.1	40.9 44.4	49.5 52.7	34.7 38.1	18.3 20.9	37.4 41.0	47.2 50.3

Semantic segmentation results on cityscapes.

model	mIOU
PSP-Res50	77.12
PSP-ConTM	78.28

Bib Citing

@article{yan2021contnet,
    title={ConTNet: Why not use convolution and transformer at the same time?},
    author={Haotian Yan and Zhe Li and Weijian Li and Changhu Wang and Ming Wu and Chuang Zhang},
    year={2021},
    journal={arXiv preprint arXiv:2104.13497}
}

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Related tags

Overview

ConTNet

Introduction

Main Results on ImageNet

Main Results on Downstream Tasks

Bib Citing

Owner

Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

Problem-943.-ACMP - Problem 943. ACMP

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

Deploy pytorch classification model using Flask and Streamlit

Cross View SLAM

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

Self-Supervised Image Denoising via Iterative Data Refinement

Code for project: "Learning to Minimize Remainder in Supervised Learning".

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

TransCD: Scene Change Detection via Transformer-based Architecture

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

deep_image_prior_extension

Object Detection Projekt in GKI WS2021/22

Meandering In Networks of Entities to Reach Verisimilar Answers

A toolkit for developing and comparing reinforcement learning algorithms.

Kaggle: Cell Instance Segmentation

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.