A novel region proposal network for more general object detection ( including scene text detection ).

Last update: Dec 12, 2022

Overview

DeRPN: Taking a further step toward more general object detection

DeRPN is a novel region proposal network which concentrates on improving the adaptivity of current detectors. The paper is available here.

Recent Update

· Mar. 13, 2019: The DeRPN pretrained models are added.

· Jan. 25, 2019: The code is released.

Contact Us

Welcome to improve DeRPN together. For any questions, please feel free to contact Lele Xie ([email protected]) or Prof. Jin ([email protected]).

Citation

If you find DeRPN useful to your research, please consider citing our paper as follow:

@article{xie2019DeRPN,
  title     = {DeRPN: Taking a further step toward more general object detection},
  author    = {Lele Xie, Yuliang Liu, Lianwen Jin*, Zecheng Xie}
  joural    = {AAAI}
  year      = {2019}
}

Main Results

Note: The reimplemented results are slightly different from those presented in the paper for different training settings, but the conclusions are still consistent. For example, this code doesn't use multi-scale training which should boost the results for both DeRPN and RPN.

COCO-Text

training data: COCO-Text train

test data: COCO-Text test

	network	[email protected]	[email protected]	[email protected]	[email protected]
RPN+Faster R-CNN	VGG16	32.48	52.54	7.40	17.59
DeRPN+Faster R-CNN	VGG16	47.39	70.46	11.05	25.12
RPN+R-FCN	ResNet-101	37.71	54.35	13.17	22.21
DeRPN+R-FCN	ResNet-101	48.62	71.30	13.37	27.57

Pascal VOC

training data: VOC 07+12 trainval

test data: VOC 07 test

Inference time is evaluated on one TITAN XP GPU.

	network	inference time	[email protected]	[email protected]	AP
RPN+Faster R-CNN	VGG16	64 ms	75.53	42.08	42.60
DeRPN+Faster R-CNN	VGG16	65 ms	76.17	44.97	43.84
RPN+R-FCN	ResNet-101	85 ms	78.87	54.30	50.04
DeRPN+R-FCN (900) *	ResNet-101	84 ms	79.21	54.43	50.28

( "*": On Pascal VOC dataset, we found that it is more suitable to train the DeRPN+R-FCN model with 900 proposals. For other experiments, we use the default proposal number to train the models, i.e., 2000 proposals fro Faster R-CNN, 300 proposals for R-FCN. )

MS COCO

training data: COCO 2017 train

test data: COCO 2017 test/val

test set	network	AP	AP50	AP75	AP_S	AP_M	AP_L
RPN+Faster R-CNN	VGG16	24.2	45.4	23.7	7.6	26.6	37.3
DeRPN+Faster R-CNN	VGG16	25.5	47.2	25.2	10.3	27.9	36.7
RPN+R-FCN	ResNet-101	27.7	47.9	29.0	10.1	30.2	40.1
DeRPN+R-FCN	ResNet-101	28.4	49.0	29.5	11.1	31.7	40.5

val set	network	AP	AP50	AP75	AP_S	AP_M	AP_L
RPN+Faster R-CNN	VGG16	24.1	45.0	23.8	7.6	27.8	37.8
DeRPN+Faster R-CNN	VGG16	25.5	47.3	25.0	9.9	28.8	37.8
RPN+R-FCN	ResNet-101	27.8	48.1	28.8	10.4	31.2	42.5
DeRPN+R-FCN	ResNet-101	28.4	48.5	29.5	11.5	32.9	42.0

Getting Started

Requirements
Installation
Preparation for Training & Testing
Usage

Requirements

Cuda 8.0 and cudnn 5.1.
Some python packages: cython, opencv-python, easydict et. al. Simply install them if your system misses these packages.
Configure the caffe according to your environment (Caffe installation instructions). As the code requires pycaffe, caffe should be built with python layers. In Makefile.config, make sure to uncomment this line:

WITH_PYTHON_LAYER := 1

An NVIDIA GPU with more than 6GB is required for ResNet-101.

Installation

Clone the DeRPN repository

git clone https://github.com/HCIILAB/DeRPN.git

Build the Cython modules
```
cd $DeRPN_ROOT/lib
make
```

Build caffe and pycaffe

cd $DeRPN_ROOT/caffe
make -j8 && make pycaffe

Preparation for Training & Testing

Dataset

Download the datasets of Pascal VOC 2007 & 2012, MS COCO 2017 and COCO-Text.
You need to put these datasets under the $DeRPN_ROOT/data folder (with symlinks).

For COCO-Text, the folder structure is as follow:

$DeRPN_ROOT/data/coco_text/images/train2014
$DeRPN_ROOT/data/coco_text/images/val2014
$DeRPN_ROOT/data/coco_text/annotations  
# train2014, val2014, and annotations are symlinks from /pth_to_coco2014/train2014, 
# /pth_to_coco2014/val2014 and /pth_to_coco2014/annotations2014/, respectively.

For COCO, the folder structure is as follow:

$DeRPN_ROOT/data/coco/images/train2017
$DeRPN_ROOT/data/coco/images/val2017
$DeRPN_ROOT/data/coco/images/test-dev2017
$DeRPN_ROOT/data/coco/annotations  
# the symlinks are similar to COCO-Text

For Pascal VOC, the folder structure is as follow:

$DeRPN_ROOT/data/VOCdevkit2007
$DeRPN_ROOT/data/VOCdevkit2012
#VOCdevkit2007 and VOCdevkit2012 are symlinks from $VOCdevkit whcich contains VOC2007 and VOC2012.

Pretrained models

Please download the ImageNet pretrained models (VGG16 and ResNet-101, password: k4z1), and put them under

$DeRPN_ROOT/data/imagenet_models

We also provide the DeRPN pretrained models here (password: fsd8).

Usage

cd $DeRPN_ROOT
./experiments/scripts/faster_rcnn_derpn_end2end.sh [GPU_ID] [NET] [DATASET]

# e.g., ./experiments/scripts/faster_rcnn_derpn_end2end.sh 0 VGG16 coco_text

Copyright

This code is free to the academic community for research purpose only. For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected].

A novel region proposal network for more general object detection ( including scene text detection ).

Related tags

Overview

DeRPN: Taking a further step toward more general object detection

Recent Update

Contact Us

Citation

Main Results

COCO-Text

Pascal VOC

MS COCO

Getting Started

Requirements

Installation

Preparation for Training & Testing

Dataset

Pretrained models

Usage

Copyright

Owner

Deep Learning and Vision Computing Lab, SCUT

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Image augmentation for machine learning experiments.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

InverseRenderNet: Learning single image inverse rendering, CVPR 2019.

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

https://arxiv.org/abs/1904.01941

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

Isearch (OSINT) 🔎 Face recognition reverse image search on Instagram profile feed photos.

Automatically remove the mosaics in images and videos, or add mosaics to them.

OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Framework for the Complete Gaze Tracking Pipeline

EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Ocular is a state-of-the-art historical OCR system.

Web interface for browsing arXiv papers

A simple document layout analysis using Python-OpenCV

Simple app for visual editing of Page XML files