caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

Owner
candler
a computer vision worker
candler
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Scene Text Recognition Recommendations Everythin about Scene Text Recognition SOTA • Papers • Datasets • Code Contents 1. Papers 2. Datasets 2.1 Synth

Deep Learning and Vision Computing Lab, SCUT 197 Jan 05, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 06, 2022
基于Paddle框架的PSENet复现

PSENet-Paddle 基于Paddle框架的PSENet复现 本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 AIStudio链接 参考项目: whai362-PSENet 环境配置 本项目

QuanHao Guo 4 Apr 24, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 01, 2021
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automaticall

CellProfiler 732 Dec 23, 2022
Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

JoeyLr 5 Oct 20, 2022
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
color detection using python

colordetection color detection using python In this color detection Python project, we are going to build an application through which you can automat

Ruchith Kumar 1 Nov 04, 2021
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
Vietnamese Language Detection and Recognition

Table of Content Introduction (Khôi viết) Dataset (đổi link thui thành 3k5 ảnh mình) Getting Started (An Viết) Requirements Usage Example Training & E

6 May 27, 2022
A python programusing Tkinter graphics library to randomize questions and answers contained in text files

RaffleOfQuestions Um programa simples em python, utilizando a biblioteca gráfica Tkinter para randomizar perguntas e respostas contidas em arquivos de

Gabriel Ferreira Rodrigues 1 Dec 16, 2021
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Rizky Dermawan 4 Mar 10, 2022
Computer vision applications project (Flask and OpenCV)

Computer Vision Applications Project This project is at it's initial phase. This is all about the implementation of different computer vision techniqu

Suryam Thapa 1 Jan 26, 2022
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
a micro OCR network with 0.07mb params.

MicroOCR a micro OCR network with 0.07mb params. Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 8,

william 29 Aug 06, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022