Localization Distillation for Object Detection

Overview

Localization Distillation for Object Detection

This repo is based on mmDetection.

This is the code for our paper:

LD is the extension of knowledge distillation on localization task, which utilizes the learned bbox distributions to transfer the localization dark knowledge from teacher to student.

LD stably improves over GFocalV1 about ~0.8 AP and ~1 AR100 without adding any computational cost!

Introduction

Knowledge distillation (KD) has witnessed its powerful ability in learning compact models in deep learning field, but it is still limited in distilling localization information for object detection. Existing KD methods for object detection mainly focus on mimicking deep features between teacher model and student model, which not only is restricted by specific model architectures, but also cannot distill localization ambiguity. In this paper, we first propose localization distillation (LD) for object detection. In particular, our LD can be formulated as standard KD by adopting the general localization representation of bounding box. Our LD is very flexible, and is applicable to distill localization ambiguity for arbitrary architecture of teacher model and student model. Moreover, it is interesting to find that Self-LD, i.e., distilling teacher model itself, can further boost state-of-the-art performance. Second, we suggest a teacher assistant (TA) strategy to fill the possible gap between teacher model and student model, by which the distillation effectiveness can be guaranteed even the selected teacher model is not optimal. On benchmark datasets PASCAL VOC and MS COCO, our LD can consistently improve the performance for student detectors, and also boosts state-of-the-art detectors notably.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py 8

Learning rate setting

lr=(samples_per_gpu * num_gpu) / 16 * 0.01

For 2 GPUs and mini-batch size 6, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.00375, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=3,

For 8 GPUs and mini-batch size 16, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=2,

Convert model

After training with LD, the weight file .pth will be large. You'd better convert the model to save a new small one. See convert_model.py#L38-L40, you can set them to your .pth file and config file. Then, run

python convert_model.py

Speed Test (FPS)

CUDA_VISIBLE_DEVICES=0 python3 ./tools/benchmark.py configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth

COCO Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth 8 --eval bbox

GFocalV1 with LD

Teacher Student Training schedule Mini-batch size AP (val) AP50 (val) AP75 (val) AP (test-dev) AP50 (test-dev) AP75 (test-dev) AR100 (test-dev)
-- R-18 1x 6 35.8 53.1 38.2 36.0 53.4 38.7 55.3
R-101 R-18 1x 6 36.5 52.9 39.3 36.8 53.5 39.9 56.6
-- R-34 1x 6 38.9 56.6 42.2 39.2 56.9 42.3 58.0
R-101 R-34 1x 6 39.8 56.6 43.1 40.0 57.1 43.5 59.3
-- R-50 1x 6 40.1 58.2 43.1 40.5 58.8 43.9 59.0
R-101 R-50 1x 6 41.1 58.7 44.9 41.2 58.8 44.7 59.8
-- R-101 2x 6 44.6 62.9 48.4 45.0 63.6 48.9 62.3
R-101-DCN R-101 2x 6 45.4 63.1 49.5 45.6 63.7 49.8 63.3

GFocalV1 with Self-LD

Teacher Student Training schedule Mini-batch size AP (val) AP50 (val) AP75 (val)
-- R-18 1x 6 35.8 53.1 38.2
R-18 R-18 1x 6 36.1 52.9 38.5
-- R-50 1x 6 40.1 58.2 43.1
R-50 R-50 1x 6 40.6 58.2 43.8
-- X-101-32x4d-DCN 1x 4 46.9 65.4 51.1
X-101-32x4d-DCN X-101-32x4d-DCN 1x 4 47.5 65.8 51.8

GFocalV2 with LD

Teacher Student Training schedule Mini-batch size AP (test-dev) AP50 (test-dev) AP75 (test-dev) AR100 (test-dev)
-- R-50 2x 16 44.4 62.3 48.5 62.4
R-101 R-50 2x 16 44.8 62.4 49.0 63.1
-- R-101 2x 16 46.0 64.1 50.2 63.5
R-101-DCN R-101 2x 16 46.8 64.5 51.1 64.3
-- R-101-DCN 2x 16 48.2 66.6 52.6 64.4
R2-101-DCN R-101-DCN 2x 16 49.1 67.1 53.7 65.6
-- X-101-32x4d-DCN 2x 16 49.0 67.6 53.4 64.7
R2-101-DCN X-101-32x4d-DCN 2x 16 50.2 68.3 54.9 66.3
-- R2-101-DCN 2x 16 50.5 68.9 55.1 66.2
R2-101-DCN R2-101-DCN 2x 16 51.0 69.1 55.9 66.8

VOC Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r18_fpn_voc.py work_dirs/ld_gflv1_r101_r18_fpn_voc/epoch_4.pth 8 --eval mAP

GFocalV1 with LD

Teacher Student Training Epochs Mini-batch size AP AP50 AP75
-- R-18 4 6 51.8 75.8 56.3
R-101 R-18 4 6 53.0 75.9 57.6
-- R-50 4 6 55.8 79.0 60.7
R-101 R-50 4 6 56.1 78.5 61.2
-- R-34 4 6 55.7 78.9 60.6
R-101-DCN R-34 4 6 56.7 78.4 62.1
-- R-101 4 6 57.6 80.4 62.7
R-101-DCN R-101 4 6 58.4 80.2 63.7

This is an example of evaluation results (R-101→R-18).

+-------------+------+-------+--------+-------+
| class       | gts  | dets  | recall | ap    |
+-------------+------+-------+--------+-------+
| aeroplane   | 285  | 4154  | 0.081  | 0.030 |
| bicycle     | 337  | 7124  | 0.125  | 0.108 |
| bird        | 459  | 5326  | 0.096  | 0.018 |
| boat        | 263  | 8307  | 0.065  | 0.034 |
| bottle      | 469  | 10203 | 0.051  | 0.045 |
| bus         | 213  | 4098  | 0.315  | 0.247 |
| car         | 1201 | 16563 | 0.193  | 0.131 |
| cat         | 358  | 4878  | 0.254  | 0.128 |
| chair       | 756  | 32655 | 0.053  | 0.027 |
| cow         | 244  | 4576  | 0.131  | 0.109 |
| diningtable | 206  | 13542 | 0.150  | 0.117 |
| dog         | 489  | 6446  | 0.196  | 0.076 |
| horse       | 348  | 5855  | 0.144  | 0.036 |
| motorbike   | 325  | 6733  | 0.052  | 0.017 |
| person      | 4528 | 51959 | 0.099  | 0.037 |
| pottedplant | 480  | 12979 | 0.031  | 0.009 |
| sheep       | 242  | 4706  | 0.132  | 0.060 |
| sofa        | 239  | 9640  | 0.192  | 0.060 |
| train       | 282  | 4986  | 0.142  | 0.042 |
| tvmonitor   | 308  | 7922  | 0.078  | 0.045 |
+-------------+------+-------+--------+-------+
| mAP         |      |       |        | 0.069 |
+-------------+------+-------+--------+-------+
AP:  0.530091167986393
['AP50: 0.759393', 'AP55: 0.744544', 'AP60: 0.724239', 'AP65: 0.693551', 'AP70: 0.639848', 'AP75: 0.576284', 'AP80: 0.489098', 'AP85: 0.378586', 'AP90: 0.226534', 'AP95: 0.068834']
{'mAP': 0.7593928575515747}

Note:

  • For more experimental details, please refer to GFocalV1, GFocalV2 and mmdetection.
  • According to ATSS, there is no gap between box-based regression and point-based regression. Personal conjectures: 1) If xywh form is able to work when using general distribution (apply uniform subinterval division for xywh), our LD can also work in xywh form. 2) If xywh form with general distribution cannot obtain better result, then the best modification is to firstly switch xywh form to tblr form and then apply general distribution and LD. Consequently, whether xywh form + general distribution works or not, our LD benefits for all the regression-based detector.

Pretrained weights

VOC COCO
GFocalV1 teacher R101 pan.baidu pw: ufc8 GFocalV1 + LD R101_R18_1x pan.baidu pw: hj8d
GFocalV1 teacher R101DCN pan.baidu pw: 5qra GFocalV1 + LD R101_R50_1x pan.baidu pw: bvzz
GFocalV1 + LD R101_R18 pan.baidu pw: 1bd3 GFocalV2 + LD R101_R50_2x pan.baidu pw: 3jtq
GFocalV1 + LD R101DCN_R34 pan.baidu pw: thuw GFocalV2 + LD R101DCN_R101_2x pan.baidu pw: zezq
GFocalV1 + LD R101DCN_R101 pan.baidu pw: mp8t GFocalV2 + LD R2N_R101DCN_2x pan.baidu pw: fsbm
GFocalV2 + LD R2N_X101_2x pan.baidu pw: 9vcc
GFocalV2 + Self-LD R2N_R2N_2x pan.baidu pw: 9azn

For any other teacher model, you can download at GFocalV1, GFocalV2 and mmdetection.

Score voting Cluster-DIoU-NMS

We provide Score voting Cluster-DIoU-NMS which is a speed up version of score voting NMS and combination with DIoU-NMS. For GFocalV1 and GFocalV2, Score voting Cluster-DIoU-NMS will bring 0.1-0.3 AP increase, 0.2-0.5 AP75 increase, <=0.4 AP50 decrease and <=1.5 FPS decrease, while it is much faster than score voting NMS in mmdetection. The relevant portion of the config file would be:

# Score voting Cluster-DIoU-NMS
test_cfg = dict(
nms=dict(type='voting_cluster_diounms', iou_threshold=0.6),

# Original NMS
test_cfg = dict(
nms=dict(type='nms', iou_threshold=0.6),

Citation

If you find LD useful in your research, please consider citing:

@Article{zheng2021LD,
  title={Localization Distillation for Object Detection},
  author= {Zhaohui Zheng, Rongguang Ye, Ping Wang, Jun Wang, Dongwei Ren, Wangmeng Zuo},
  journal={arXiv:2102.12252},
  year={2021}
}
Owner
Master student
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
URIE: Universal Image Enhancementfor Visual Recognition in the Wild

URIE: Universal Image Enhancementfor Visual Recognition in the Wild This is the implementation of the paper "URIE: Universal Image Enhancement for Vis

Taeyoung Son 43 Sep 12, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

xRBM Library Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow Installation Using pip: pip install xrbm Examples Tut

Omid Alemi 55 Dec 29, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

Weirui Ye 671 Jan 03, 2023
A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Welcome to Carbon Insight Carbon Insight is a platform aiming to display the carbon neutralization roadmap for researchers, decision-makers, and other

Microsoft 14 Oct 24, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 01, 2023
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features

CleanRL (Clean Implementation of RL Algorithms) CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation

Costa Huang 1.8k Jan 01, 2023
[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Balanced MSE Code for the paper: Balanced MSE for Imbalanced Visual Regression Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu CVPR 2022 (Oral) News

Jiawei Ren 267 Jan 01, 2023
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Anytime Autoregressive Model Anytime Sampling for Autoregressive Models via Ordered Autoencoding , ICLR 21 Yilun Xu, Yang Song, Sahaj Gara, Linyuan Go

Yilun Xu 22 Sep 08, 2022
Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation(Unfinished)

Keras-FCN Fully convolutional networks and semantic segmentation with Keras. Models Models are found in models.py, and include ResNet and DenseNet bas

645 Dec 29, 2022
PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

Deep Networks from the Principle of Rate Reduction This repository is the official PyTorch implementation of the paper Deep Networks from the Principl

459 Dec 27, 2022
ReGAN: Sequence GAN using RE[INFORCE|LAX|BAR] based PG estimators

Sequence Generation with GANs trained by Gradient Estimation Requirements: PyTorch v0.3 Python 3.6 CUDA 9.1 (For GPU) Origin The idea is from paper Se

40 Nov 03, 2022
Code for Blind Image Decomposition (BID) and Blind Image Decomposition network (BIDeN).

arXiv, porject page, paper Blind Image Decomposition (BID) Blind Image Decomposition is a novel task. The task requires separating a superimposed imag

64 Dec 20, 2022
Gluon CV Toolkit

Gluon CV Toolkit | Installation | Documentation | Tutorials | GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in

Distributed (Deep) Machine Learning Community 5.4k Jan 06, 2023
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Peter Sharpe 394 Dec 23, 2022
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

Phil Wang 108 Nov 23, 2022
Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

JinTian 14 Aug 30, 2022