Augmenting Anchors by the Detector Itself

Related tags

Computer Visionaadi
Overview

Augmenting Anchors by the Detector Itself

Introduction

It is difficult to determine the scale and aspect ratio of anchors for anchor-based object detection methods. Current state-of-the-art object detectors either determine anchor parameters according to objects' shape and scale in a dataset, or avoid this problem by utilizing anchor-free method. In this paper, we propose a gradient-free anchor augmentation method named AADI, which means Augmenting Anchors by the Detector Itself. AADI is not an anchor-free method, but it converts the scale and aspect ratio of anchors from a continuous space to a discrete space, which greatly alleviates the problem of anchors' designation. Furthermore, AADI does not add any parameters or hyper-parameters, which is beneficial for future research and downstream tasks. Extensive experiments on COCO dataset show that AADI has obvious advantages for both two-stage and single-stage methods, specifically, AADI achieves at least 2.1 AP improvements on Faster R-CNN and 1.6 AP improvements on RetinaNet, using ResNet-50 model. We hope that this simple and cost-efficient method can be widely used in object detection.

  • For RPN

    • Baseline

      Num anchors AR100 AR1000 ARs ARm ARl
      1 45.5 55.6 31.4 52.8 60.0
      3 45.7 58.0 31.4 52.7 61.1
    • Ablation Study

      dilation Anchor Guided AR100 AR1000 ARs ARm ARl
      1 52.8 60.6 40.2 60.8 63.6
      2 54.8 64.7 39.0 63.1 70.6
      2 56.3 66.7 39.5 64.9 73.4
      3 53.7 64.0 35.4 62.1 73.9
      3 55.6 67.6 36.1 64.3 77.6
      4 52.2 60.5 30.9 61.3 76.6
      4 54.4 65.5 33.0 63.7 78.9
  • For RetinaNet

    • Ablation Study

      AADI dilation AP AP50 AP75 APs APm APl
      1 38.2 58.4 41.1 24.3 42.2 48.5
      1 37.3 56.4 40.2 22.0 39.9 46.8
      2 39.8 57.5 43.5 22.1 43.5 50.6
      3 38.3 54.6 41.7 20.0 43.1 51.1
    • With IoU

      AP AP50 AP75 APs APm APl
      40.2 57.7 43.8 24.1 43.1 52.2
    • With 3x schedule (RetinaNet with giou, AADI with smooth l1)

      Model AP AP50 AP75 APs APm APl
      RetinaNet 39.6 59.3 42.2 24.9 43.3 50.7
      AADI-RetinaNet 41.4 59.3 45.2 24.8 44.9 54.0
  • For Faster R-CNN

    • Ablation Study

      AADI dilation AP AP50 AP75 APs APm APl FPS
      1(3 anchors) 37.9 58.8 41.1 22.4 41.1 49.1 26.3
      2 40.3 59.3 44.3 24.2 43.3 52.2 22.4
      3 40.8 59.5 45.0 24.0 44.6 53.1 22.4
      4 40.5 58.7 44.6 23.2 44.8 52.7 22.3
    • 3x schedule

      Backbone AP AP50 AP75 APs APm APl FPS
      R-50 FPN 42.5 61.2 46.5 25.3 46.2 55.5 22.6
      DCN-50 FPN 44.1 63.1 48.2 28.3 46.9 58.4 20.1
      R-101 FPN 44.5 63.2 48.7 26.9 48.3 57.4 17.4
  • Detectron2

Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.

Installation

See installation instructions.

Getting Started

See Getting Started with Detectron2, and the Colab Notebook to learn about basic usage.

Learn more at our documentation.

Citing Detectron2

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

@misc{wan2021augmenting,
      title={Augmenting Anchors by the Detector Itself}, 
      author={Xiaopei Wan and Shengjie Chen and Yujiu Yang and Zhenhua Guo and Fangbo Tao},
      year={2021},
      eprint={2105.14086},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

61 Dec 22, 2022
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 04, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 07, 2023
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
A pkg stiching around view images(4-6cameras) to generate bird's eye view.

AVP-BEV-OPEN Please check our new work AVP_SLAM_SIM A pkg stiching around view images(4-6cameras) to generate bird's eye view! View Demo · Report Bug

Xinliang Zhong 37 Dec 01, 2022
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Jan 05, 2023
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Kakul 230 Dec 21, 2022
M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

M-LSD-warpPerspective-Example M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later tensorflow 2.4.1 or Later Usage 実行方法は以下です。 pytho

KazuhitoTakahashi 9 Oct 14, 2022
TedEval: A Fair Evaluation Metric for Scene Text Detectors

TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,

Clova AI Research 167 Nov 20, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
A tool to make dumpy among us GIFS

Among Us Dumpy Gif Maker Made by ThatOneCalculator & Pixer415 With help from Telk, karl-police, and auguwu! Please credit this repository when you use

Kainoa Kanter 535 Jan 07, 2023
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 03, 2023
Table Extraction Tool

Tree Structure - Table Extraction Fonduer has been successfully extended to perform information extraction from richly formatted data such as tables.

HazyResearch 88 Jun 02, 2022