Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Last update: Dec 31, 2022

Overview

Convolutional Recurrent Neural Network

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.

UPDATE Mar 14, 2017 A Docker file has been added to the project. Thanks to @varun-suresh.

UPDATE May 1, 2017 A PyTorch port has been made by @meijieru.

UPDATE Jun 19, 2017 For an end-to-end text detector+recognizer, check out the CTPN+CRNN implementation by @AKSHAYUBHAT.

Build

The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.

To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

Example image:

Expected output:

Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)

Another example:

Recognized text: shakeshack (raw: ss-h-a--k-e-ssh--aa-c--k--)

Use pretrained model

The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree and recognizeImageWithLexicion in file utilities.lua for details.

Train a new model

Follow the following steps to train a new model on your own dataset.

Create a new LMDB dataset. A python program is provided in tool/create_dataset.py. Refer to the function createDataset for details (need to pip install lmdb first).
Create model directory under model/. For example, model/foo_model. Then create configuraton file config.lua under the model directory. You can copy model/crnn_demo/config.lua and do modifications.
Go to src/ and execute th main_train.lua ../models/foo_model/. Model snapshots and logging file will be saved into the model directory.

Build using docker

Install docker. Follow the instructions here
Install nvidia-docker - Follow the instructions here
Clone this repo, from this directory run docker build -t crnn_docker .
Once the image is built, the docker can be run using nvidia-docker run -it crnn_docker.

Citation

Please cite the following paper if you are using the code/model in your research paper.

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Acknowledgements

The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.

Please let me know if you encounter any issues.

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Related tags

Overview

Convolutional Recurrent Neural Network

Build

Run demo

Use pretrained model

Train a new model

Build using docker

Citation

Acknowledgements

Owner

Baoguang Shi

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Generate text images for training deep learning ocr model

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Semantic-based Patch Detection for Binary Programs

Fine tuning keras-ocr python package with custom synthetic dataset from scratch

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

A curated list of promising OCR resources

Learn computer graphics by writing GPU shaders!

Ackermann Line Follower Robot Simulation.

Text page dewarping using a "cubic sheet" model

Perspective recovery of text using transformed ellipses

M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

computer vision, image processing and machine learning on the web browser or node.

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Single Shot Text Detector with Regional Attention

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.