TextBoxes++: A Single-Shot Oriented Scene Text Detector

Overview

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Introduction

This is an application for scene text detection (TextBoxes++) and recognition (CRNN).

TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of TextBoxes. CRNN is an open-source text recognizer. The code of TextBoxes++ is based on SSD and TextBoxes. The code of CRNN is modified from CRNN.

For more details, please refer to our arXiv paper.

Citing the related works

Please cite the related works in your publications if it helps your research:

@article{Liao2018Text,
  title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
  author = {Minghui Liao, Baoguang Shi and Xiang Bai},
  journal = {{IEEE} Transactions on Image Processing},
  doi  = {10.1109/TIP.2018.2825107},
  url = {https://doi.org/10.1109/TIP.2018.2825107},
  volume = {27},
  number = {8},
  pages = {3676--3690},
  year = {2018}
}

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} TPAMI},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Contents

  1. Requirements
  2. Installation
  3. Docker
  4. Models
  5. Demo
  6. Train

Requirements

NOTE There is partial support for a docker image. See docker/README.md. (Thank you for the PR from @mdbenito)

Torch7 for CRNN; 
g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0

Please refer to Caffe Installation to ensure other dependencies;

Installation

  1. compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
  1. compile CRNN (Please refer to CRNN if you have trouble with the compilation.)
cd crnn/src/
sh build_cpp.sh

Docker

(Thanks for the PR from @idotobi)

Build Docke Image

docker build -t tbpp_crnn:gpu .

This can take +1h, so go get a coffee ;)

Once this is done you can start a container via nvidia-docker.

nvidia-docker run -it --rm tbpp_crnn:gpu bash

To check if the GPU is available inside the docker container you can run nvidia-smi.

It's recommendable to mount the ./models and ./crnn/model/ directories to include the downloaded models.

nvidia-docker run -it \
                  --rm \
                  -v ${PWD}/models:/opt/caffe/models \ 
                  -v ${PWD}/crrn/model:/opt/caffe/crrn/model \
                  tbpp_crnn:gpu bash

For convenince this command is executed when running ./run.bash.

Models

  1. pre-trained model on SynthText (used for training): Dropbox; BaiduYun

  2. model trained on ICDAR 2015 Incidental Text (used for testing): Dropbox; BaiduYun

    Please place the above models in "./models/"

    If your data is hugely different from ICDAR 2015 Incidental Text,you'd better train it on your own data based on the pre-trained model on SynthText.

  3. CRNN model: Dropbox; BaiduYun

    Please place the crnn model in "./crnn/model/"

Demo

Download the ICDAR 2015 model and place it in "./models/"

python examples/text/demo.py

The detection results and recognition results are in "./demo_images"

Train

Create lmdb data

  1. convert ground truth into "xml" form: example.xml

  2. create train/test lists (train.txt / test.txt) in "./data/text/" with the following form:

     path_to_example1.jpg path_to_example1.xml
     path_to_example2.jpg path_to_example2.xml
    
  3. Run "./data/text/creat_data.sh"

Start training

1. modify the lmdb path in modelConfig.py
2. Run "python examples/text/train.py"
Owner
Minghui Liao
Minghui Liao, a Ph.D. student of Huazhong University of Science and Technology.
Minghui Liao
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 02, 2023
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose

Vite Ma Dose ! est un outil open source de CovidTracker permettant de détecter les rendez-vous disponibles dans votre département afin de vous faire v

CovidTracker 239 Dec 13, 2022
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 03, 2023
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Daniel Jarrett 26 Jun 17, 2021
An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Interactive GrabCut An interactive interface for using OpenCV's GrabCut algorithm for image segmentation. Setup Install dependencies: pip install nump

Jason Y. Zhang 16 Oct 10, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
一款基于Qt与OpenCV的仿真数字示波器

一款基于Qt与OpenCV的仿真数字示波器

郭赟 4 Nov 02, 2022
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Visual Behavior 86 Dec 28, 2022
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

opencv_yuz_bulma Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz. Bilgisarın kendi kamerasını kullanmak için;

Ahmet Haydar Ornek 6 Apr 16, 2022
Maze generator and solver with python

Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P

Joseph 19 Dec 07, 2022
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022