End-to-End Object Detection with Fully Convolutional Network

Last update: Dec 22, 2022

Overview

End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

Experiments in the paper were conducted on the internal framework, thus we reimplement them on cvpods and report details as below.

Requirements

cvpods
scipy >= 1.5.4

Get Started

install cvpods locally (requires cuda to compile)

python3 -m pip install 'git+https://github.com/Megvii-BaseDetection/cvpods.git'
# (add --user if you don't have permission)

# Or, to install it from a local clone:
git clone https://github.com/Megvii-BaseDetection/cvpods.git
python3 -m pip install -e cvpods

# Or,
pip install -r requirements.txt
python3 setup.py build develop

prepare datasets

cd /path/to/cvpods
cd datasets
ln -s /path/to/your/coco/dataset coco

Train & Test

git clone https://github.com/Megvii-BaseDetection/DeFCN.git
cd DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms  # for example

# Train
pods_train --num-gpus 8

# Test
pods_test --num-gpus 8 \
    MODEL.WEIGHTS /path/to/your/save_dir/ckpt.pth # optional
    OUTPUT_DIR /path/to/your/save_dir # optional

# Multi node training
## sudo apt install net-tools ifconfig
pods_train --num-gpus 8 --num-machines N --machine-rank 0/1/.../N-1 --dist-url "tcp://MASTER_IP:port"

Results on COCO2017 val set

model	assignment	with NMS	lr sched.	mAP	mAR	download
FCOS	one-to-many	Yes	3x + ms	41.4	59.1	weight \| log
FCOS baseline	one-to-many	Yes	3x + ms	40.9	58.4	weight \| log
Anchor	one-to-one	No	3x + ms	37.1	60.5	weight \| log
Center	one-to-one	No	3x + ms	35.2	61.0	weight \| log
Foreground Loss	one-to-one	No	3x + ms	38.7	62.2	weight \| log
POTO	one-to-one	No	3x + ms	39.2	61.7	weight \| log
POTO + 3DMF	one-to-one	No	3x + ms	40.6	61.6	weight \| log
POTO + 3DMF + Aux	mixture*	No	3x + ms	41.4	61.5	weight \| log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

2x + ms schedule is adopted in the paper, but we adopt 3x + ms schedule here to achieve higher performance.
It's normal to observe ~0.3AP noise in POTO.

Results on CrowdHuman val set

model	assignment	with NMS	lr sched.	AP50	mMR	recall	download
FCOS	one-to-many	Yes	30k iters	86.1	54.9	94.2	weight \| log
ATSS	one-to-many	Yes	30k iters	87.2	49.7	94.0	weight \| log
POTO	one-to-one	No	30k iters	88.5	52.2	96.3	weight \| log
POTO + 3DMF	one-to-one	No	30k iters	88.8	51.0	96.6	weight \| log
POTO + 3DMF + Aux	mixture*	No	30k iters	89.1	48.9	96.5	weight \| log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

It's normal to observe ~0.3AP noise in POTO, and ~1.0mMR noise in all methods.

Ablations on COCO2017 val set

model	assignment	with NMS	lr sched.	mAP	mAR	note
POTO	one-to-one	No	6x + ms	40.0	61.9
POTO	one-to-one	No	9x + ms	40.2	62.3
POTO	one-to-one	No	3x + ms	39.2	61.1	replace Hungarian algorithm by `argmax`
POTO + 3DMF	one-to-one	No	3x + ms	40.9	62.0	remove GN in 3DMF
POTO + 3DMF + Aux	mixture*	No	3x + ms	41.5	61.5	remove GN in 3DMF

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

For one-to-one assignment, more training iters lead to higher performance.
The argmax (also known as top-1) operation is indeed the approximate solution of bipartite matching in dense prediction methods.
It seems harmless to remove GN in 3DMF, which also leads to higher inference speed.

Acknowledgement

This repo is developed based on cvpods. Please check cvpods for more details and features.

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citing

If you use this work in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wang2020end,
  title   =  {End-to-End Object Detection with Fully Convolutional Network},
  author  =  {Wang, Jianfeng and Song, Lin and Li, Zeming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
  journal =  {arXiv preprint arXiv:2012.03544},
  year    =  {2020}
}

Contributing to the project

Any pull requests or issues about the implementation are welcome. If you have any issue about the library (e.g. installation, environments), please refer to cvpods.

End-to-End Object Detection with Fully Convolutional Network

Related tags

Overview

End-to-End Object Detection with Fully Convolutional Network

Requirements

Get Started

Results on COCO2017 val set

Results on CrowdHuman val set

Ablations on COCO2017 val set

Acknowledgement

License

Citing

Contributing to the project

Owner

Lux AI environment interface for RLlib multi-agents

Multi-Stage Spatial-Temporal Convolutional Neural Network (MS-GCN)

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Demo code for paper "Learning optical flow from still images", CVPR 2021.

Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Light-Head R-CNN

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

YOLOPのPythonでのONNX推論サンプル

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Training DiffWave using variational method from Variational Diffusion Models.

Synthetic LiDAR sequential point cloud dataset with point-wise annotations

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

Video-based open-world segmentation

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

This repository provides the code for MedViLL(Medical Vision Language Learner).

Mixed Neural Likelihood Estimation for models of decision-making

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)