An official implementation of the Anchor DETR.

Overview

Anchor DETR: Query Design for Transformer-Based Detector

Introduction

This repository is an official implementation of the Anchor DETR. We encode the anchor points as the object queries in DETR. Multiple patterns are attached to each anchor point to solve the difficulty: "one region, multiple objects". We also propose an attention variant RCDA to reduce the memory cost for high-resolution features.

DETR

Main Results

feature epochs AP GFLOPs Infer Speed (FPS)
DETR DC5 500 43.3 187 10 (12)
SMCA multi-level 50 43.7 152 10
Deformable DETR multi-level 50 43.8 173 15
Conditional DETR DC5 50 43.8 195 10
Anchor DETR DC5 50 44.3 151 16 (19)

Note:

  1. The results are based on ResNet-50 backbone.
  2. Inference speeds are measured on NVIDIA Tesla V100 GPU.
  3. The values in parentheses of the Infer Speed indicate the speed with torchscript optimization.

Model

name backbone AP URL
AnchorDETR-C5 R50 42.1 model / log
AnchorDETR-DC5 R50 44.3 model / log
AnchorDETR-C5 R101 43.5 model / log
AnchorDETR-DC5 R101 45.1 model / log

Note: the models and logs are also available at Baidu Netdisk with code hh13.

Usage

Installation

First, clone the repository locally:

git clone https://github.com/megvii-research/AnchorDETR.git

Then, install dependencies:

pip install -r requirements.txt

Training

To train AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py  --coco_path /path/to/coco 

Evaluation

To evaluate AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth 

To evaluate AnchorDETR with a single GPU:

python main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{wang2021anchor,
      title={Anchor DETR: Query Design for Transformer-Based Detector},
      author={Yingming Wang and Xiangyu Zhang and Tong Yang and Jian Sun},
      year={2021},
      eprint={2109.07107},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

Meta Learning for Semi-Supervised Few-Shot Classification

few-shot-ssl-public Code for paper Meta-Learning for Semi-Supervised Few-Shot Classification. [arxiv] Dependencies cv2 numpy pandas python 2.7 / 3.5+

Mengye Ren 501 Jan 08, 2023
Train an imgs.ai model on your own dataset

imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings.

Fabian Offert 5 Dec 21, 2021
TensorLight - A high-level framework for TensorFlow

TensorLight is a high-level framework for TensorFlow-based machine intelligence applications. It reduces boilerplate code and enables advanced feature

Benjamin Kan 10 Jul 31, 2022
Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

16 Nov 28, 2022
Deep Learning to Improve Breast Cancer Detection on Screening Mammography

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Deep Learning to Improve Breast

Li Shen 305 Jan 03, 2023
Real time sign language recognition

The proposed work aims at converting american sign language gestures into English that can be understood by everyone in real time.

Mohit Kaushik 6 Jun 13, 2022
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
The project was to detect traffic signs, based on the Megengine framework.

trafficsign 赛题 旷视AI智慧交通开源赛道,初赛1/177,复赛1/12。 本赛题为复杂场景的交通标志检测,对五种交通标志进行识别。 框架 megengine 算法方案 网络框架 atss + resnext101_32x8d 训练阶段 图片尺寸 最终提交版本输入图片尺寸为(1500,2

20 Dec 02, 2022
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been

Yewon 11 Jun 12, 2022
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Deep Learning - All You Need to Know Sponsorship To support maintaining and upgrading this project, please kindly consider Sponsoring the project deve

Instill AI 4.4k Dec 26, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl

ElementAI 101 Dec 12, 2022
This repo includes the supplementary of our paper "CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels"

Supplementary Materials for CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels This repository includes all supplementary mater

Zhiwei Li 0 Jan 05, 2022
Recreate CenternetV2 based on MMDET.

Introduction This project is trying to Recreate CenternetV2 based on MMDET, which is proposed in paper Probabilistic two-stage detection. This project

25 Dec 09, 2022
Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

Sean Naren 2k Jan 04, 2023