Improving Object Detection by Estimating Bounding Box Quality Accurately

Related tags

Deep LearningLQM
Overview

Improving Object Detection by Estimating Bounding Box Quality Accurately

Abstract

Object detection aims to locate and classify object instances in images. Therefore, the object detection model is generally implemented with two parallel branches to optimize localization and classification. After training the detection model, we should select the best bounding box of each class among a number of estimations for reliable inference. Generally, NMS (Non Maximum Suppression) is operated to suppress low-quality bounding boxes by referring to classification scores or center-ness scores. However, since the quality of bounding boxes is not considered, the low-quality bounding boxes can be accidentally selected as a positive bounding box for the corresponding class. We believe that this misalignment between two parallel tasks causes degrading of the object detection performance. In this paper, we propose a method to estimate bounding boxes' quality using four-directional Gaussian quality modeling, which leads the consistent results between two parallel branches. Extensive experiments on the MS COCO benchmark show that the proposed method consistently outperforms the baseline (FCOS). Eventually, our best model offers the state-of-the-art performance by achieving 48.9% in AP. We also confirm the efficiency of the method by comparing the number of parameters and computational overhead.

Overall Architecture

Implementation Details

We implement our detection model on top of MMDetection (v2.6), an open source object detection toolbox. If not specified separately, the default settings of FCOS implementation are not changed. We train and validate our network on four RTX TITAN GPUs in the environment of Pytorch v1.6 and CUDA v10.2.

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Installation


  1. Clone the this repository.

    git clone https://github.com/sanghun3819/LQM.git
    cd LQM
  2. Create a conda virtural environment and install dependencies.

    conda env create -f environment.yml
  3. Activate conda environment

    conda activate lqm
  4. Install build requirements and then install MMDetection.

    pip install -r requirements/build.txt
    pip install -v -e .

Preparing MS COCO dataset


bash download_coco.sh

Preparing Pre-trained model weights


bash download_weights.sh

Train


# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py 4 --validate

Inference


./tools/dist_test.sh configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py work_dirs/uncertainty_guide_r50_fpn_1x/epoch_12.pth 4 --eval bbox

Image demo using pretrained model weight


# Result will be saved under the demo directory of this project (detection_result.jpg)
# config, checkpoint, source image path are needed (If you need pre-trained weights, you can download them from provided google drive link)
# score threshold is optional

python demo/LQM_image_demo.py --config configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py --checkpoint work_dirs/pretrained/LQM_r50_fpn_1x.pth --img data/coco/test2017/000000011245.jpg --score-thr 0.3

Models


For your convenience, we provide the following trained models. All models are trained with 16 images in a mini-batch with 4 GPUs.

Model Multi-scale training AP (minival) Link
LQM_R50_FPN_1x No 40.0 Google
LQM_R101_FPN_2x Yes 44.8 Google
LQM_R101_dcnv2_FPN_2x Yes 47.4 Google
LQM_X101_FPN_2x Yes 47.2 Google
LQM_X101_dcnv2_FPN_2x Yes 48.9 Google
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 05, 2023
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Multiple Object Tracking with Yolov5!

Tracking with yolov5 This implementation is for who need to tracking multi-object only with detector. You can easily track mult-object with your well

9 Nov 08, 2022
Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

Yang He 27 Jan 07, 2023
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.

Readme: Illuminating Diverse Neural Cellular Automata for Level Generation This is the codebase used to generate the results presented in the paper av

Sam Earle 27 Jan 05, 2023
Python implementation of "Single Image Haze Removal Using Dark Channel Prior"

##Dependencies pillow(~2.6.0) Numpy(~1.9.0) If the scripts throw AttributeError: __float__, make sure your pillow has jpeg support e.g. try: $ sudo ap

Joyee Cheung 73 Dec 20, 2022
Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

Neural Wireframe Renderer: Learning Wireframe to Image Translations Pytorch implementation of ideas from the paper Neural Wireframe Renderer: Learning

Yuan Xue 7 Nov 14, 2022
Fast Neural Representations for Direct Volume Rendering

Fast Neural Representations for Direct Volume Rendering Sebastian Weiss, Philipp Hermüller, Rüdiger Westermann This repository contains the code and s

Sebastian Weiss 20 Dec 03, 2022
Implementation of "Large Steps in Inverse Rendering of Geometry"

Large Steps in Inverse Rendering of Geometry ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), December 2021. Baptiste Nicolet · Alec Jacob

RGL: Realistic Graphics Lab 274 Jan 06, 2023
Convert game ISO and archives to CD CHD for emulation on Linux.

tochd Convert game ISO and archives to CD CHD for emulation. Author: Tuncay D. Source: https://github.com/thingsiplay/tochd Releases: https://github.c

Tuncay 20 Jan 02, 2023
"Neural Turing Machine" in Tensorflow

Neural Turing Machine in Tensorflow Tensorflow implementation of Neural Turing Machine. This implementation uses an LSTM controller. NTM models with m

Taehoon Kim 1k Dec 06, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
Quantized models with python

quantized-network download .pth files to qmodels/: googlenet : https://download.

adreamxcj 2 Dec 28, 2021
Neural Contours: Learning to Draw Lines from 3D Shapes (CVPR2020)

Neural Contours: Learning to Draw Lines from 3D Shapes This repository contains the PyTorch implementation for CVPR 2020 Paper "Neural Contours: Learn

93 Dec 16, 2022
A script depending on VASP output for calculating Fermi-Softness.

Fermi softness calculation for Vienna Ab initio Simulation Package (VASP) Update 1.1.0: Big update: Rewrote the code. Use Bader atomic division instea

qslin 11 Nov 08, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
List some popular DeepFake models e.g. DeepFake, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, SimSwap, CihaNet, etc.

deepfake-models List some popular DeepFake models e.g. DeepFake, CihaNet, SimSwap, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, Si

Mingcan Xiang 100 Dec 17, 2022
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

Contextualized Perturbation for Textual Adversarial Attack Introduction This is a PyTorch implementation of Contextualized Perturbation for Textual Ad

cookielee77 30 Jan 01, 2023