Object detection, 3D detection, and pose estimation using center point detection:

Overview

Objects as Points

Object detection, 3D detection, and pose estimation using center point detection:

Objects as Points,
Xingyi Zhou, Dequan Wang, Philipp Krähenbühl,
arXiv technical report (arXiv 1904.07850)

Contact: [email protected]. Any questions or discussions are welcomed!

Updates

  • (June, 2020) We released a state-of-the-art Lidar-based 3D detection and tracking framework CenterPoint.
  • (April, 2020) We released a state-of-the-art (multi-category-/ pose-/ 3d-) tracking extension CenterTrack.

Abstract

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point -- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

Highlights

  • Simple: One-sentence method summary: use keypoint detection technic to detect the bounding box center point and regress to all other object properties like bounding box size, 3d information, and pose.

  • Versatile: The same framework works for object detection, 3d bounding box estimation, and multi-person pose estimation with minor modification.

  • Fast: The whole process in a single network feedforward. No NMS post processing is needed. Our DLA-34 model runs at 52 FPS with 37.4 COCO AP.

  • Strong: Our best single model achieves 45.1AP on COCO test-dev.

  • Easy to use: We provide user friendly testing API and webcam demos.

Main results

Object Detection on COCO validation

Backbone AP / FPS Flip AP / FPS Multi-scale AP / FPS
Hourglass-104 40.3 / 14 42.2 / 7.8 45.1 / 1.4
DLA-34 37.4 / 52 39.2 / 28 41.7 / 4
ResNet-101 34.6 / 45 36.2 / 25 39.3 / 4
ResNet-18 28.1 / 142 30.0 / 71 33.2 / 12

Keypoint detection on COCO validation

Backbone AP FPS
Hourglass-104 64.0 6.6
DLA-34 58.9 23

3D bounding box detection on KITTI validation

Backbone FPS AP-E AP-M AP-H AOS-E AOS-M AOS-H BEV-E BEV-M BEV-H
DLA-34 32 96.9 87.8 79.2 93.9 84.3 75.7 34.0 30.5 26.8

All models and details are available in our Model zoo.

Installation

Please refer to INSTALL.md for installation instructions.

Use CenterNet

We support demo for image/ image folder, video, and webcam.

First, download the models (By default, ctdet_coco_dla_2x for detection and multi_pose_dla_3x for human pose estimation) from the Model zoo and put them in CenterNet_ROOT/models/.

For object detection on images/ video, run:

python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

We provide example images in CenterNet_ROOT/images/ (from Detectron). If set up correctly, the output should look like

For webcam demo, run

python demo.py ctdet --demo webcam --load_model ../models/ctdet_coco_dla_2x.pth

Similarly, for human pose estimation, run:

python demo.py multi_pose --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/multi_pose_dla_3x.pth

The result for the example images should look like:

You can add --debug 2 to visualize the heatmap outputs. You can add --flip_test for flip test.

To use this CenterNet in your own project, you can

import sys
CENTERNET_PATH = /path/to/CenterNet/src/lib/
sys.path.insert(0, CENTERNET_PATH)

from detectors.detector_factory import detector_factory
from opts import opts

MODEL_PATH = /path/to/model
TASK = 'ctdet' # or 'multi_pose' for human pose estimation
opt = opts().init('{} --load_model {}'.format(TASK, MODEL_PATH).split(' '))
detector = detector_factory[opt.task](opt)

img = image/or/path/to/your/image/
ret = detector.run(img)['results']

ret will be a python dict: {category_id : [[x1, y1, x2, y2, score], ...], }

Benchmark Evaluation and Training

After installation, follow the instructions in DATA.md to setup the datasets. Then check GETTING_STARTED.md to reproduce the results in the paper. We provide scripts for all the experiments in the experiments folder.

Develop

If you are interested in training CenterNet in a new dataset, use CenterNet in a new task, or use a new network architecture for CenterNet, please refer to DEVELOP.md. Also feel free to send us emails for discussions or suggestions.

Third-party resources

License

CenterNet itself is released under the MIT License (refer to the LICENSE file for details). Portions of the code are borrowed from human-pose-estimation.pytorch (image transform, resnet), CornerNet (hourglassnet, loss functions), dla (DLA network), DCNv2(deformable convolutions), tf-faster-rcnn(Pascal VOC evaluation) and kitti_eval (KITTI dataset evaluation). Please refer to the original License of these projects (See NOTICE).

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2019objects,
  title={Objects as Points},
  author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={arXiv preprint arXiv:1904.07850},
  year={2019}
}
Owner
Xingyi Zhou
CS Ph.D. student at UT Austin.
Xingyi Zhou
Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Hold me tight! Influence of discriminative features on deep network boundaries This is the source code to reproduce the experiments of the NeurIPS 202

EPFL LTS4 19 Dec 10, 2021
This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger Bands to create a projected active liquidity range.

Gamma's Strategy One This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger

Gamma Strategies 46 Dec 02, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

AI2 114 Jan 06, 2023
Deep Learning Pipelines for Apache Spark

Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra

Databricks 2k Jan 08, 2023
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

Jonathan Shobrook 305 Dec 21, 2022
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
MaRS - a recursive filtering framework that allows for truly modular multi-sensor integration

The Modular and Robust State-Estimation Framework, or short, MaRS, is a recursive filtering framework that allows for truly modular multi-sensor integration

Control of Networked Systems - University of Klagenfurt 143 Dec 29, 2022
A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

LP-Option-Hedging Description A Python program to analyze leveraged liquidity farming/mining and find the optimal option combination for hedging imper

Aureliano 18 Dec 19, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
Generative Models for Graph-Based Protein Design

Graph-Based Protein Design This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay

John Ingraham 159 Dec 15, 2022
Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"

SAGCN - Official PyTorch Implementation | Paper | Project Page This is the official implementation of the paper "Steganographer detection via a simila

ZHANG Zhi 1 Nov 26, 2021
A Lightweight Experiment & Resource Monitoring Tool 📺

Lightweight Experiment & Resource Monitoring 📺 "Did I already run this experiment before? How many resources are currently available on my cluster?"

170 Dec 28, 2022
Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Amirsina Torfi 114 Dec 18, 2022
Pytorch library for end-to-end transformer models training and serving

Pytorch library for end-to-end transformer models training and serving

Mikhail Grankin 768 Jan 01, 2023
Artifacts for paper "MMO: Meta Multi-Objectivization for Software Configuration Tuning"

MMO: Meta Multi-Objectivization for Software Configuration Tuning This repository contains the data and code for the following paper that is currently

0 Nov 17, 2021
FAVD: Featherweight Assisted Vulnerability Discovery

FAVD: Featherweight Assisted Vulnerability Discovery This repository contains the replication package for the paper "Featherweight Assisted Vulnerabil

secureIT 4 Sep 16, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

Weirui Ye 671 Jan 03, 2023
Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

Continual Learning Through Synaptic Intelligence This repository contains code to reproduce the key findings of our path integral approach to prevent

Ganguli Lab 82 Nov 03, 2022
Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

multi-task_losses_optimizer Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task 已经实验过了,不会有cuda out of memory情况 ##Par

14 Dec 25, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022