Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

Overview

Convolutional Recurrent Neural Network

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.

UPDATE Mar 14, 2017 A Docker file has been added to the project. Thanks to @varun-suresh.

UPDATE May 1, 2017 A PyTorch port has been made by @meijieru.

UPDATE Jun 19, 2017 For an end-to-end text detector+recognizer, check out the CTPN+CRNN implementation by @AKSHAYUBHAT.

Build

The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.

To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

Example image: Example Image

Expected output:

Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)

Another example: Example Image2

Recognized text: shakeshack (raw: ss-h-a--k-e-ssh--aa-c--k--)

Use pretrained model

The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree and recognizeImageWithLexicion in file utilities.lua for details.

Train a new model

Follow the following steps to train a new model on your own dataset.

  1. Create a new LMDB dataset. A python program is provided in tool/create_dataset.py. Refer to the function createDataset for details (need to pip install lmdb first).
  2. Create model directory under model/. For example, model/foo_model. Then create configuraton file config.lua under the model directory. You can copy model/crnn_demo/config.lua and do modifications.
  3. Go to src/ and execute th main_train.lua ../models/foo_model/. Model snapshots and logging file will be saved into the model directory.

Build using docker

  1. Install docker. Follow the instructions here
  2. Install nvidia-docker - Follow the instructions here
  3. Clone this repo, from this directory run docker build -t crnn_docker .
  4. Once the image is built, the docker can be run using nvidia-docker run -it crnn_docker.

Citation

Please cite the following paper if you are using the code/model in your research paper.

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Acknowledgements

The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.

Please let me know if you encounter any issues.

Owner
Baoguang Shi
Researcher at Microsoft
Baoguang Shi
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 05, 2023
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
POT : Python Optimal Transport

This open source Python library provide several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.

Python Optimal Transport 1.7k Jan 04, 2023
A curated list of papers, code and resources pertaining to image composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

BCMI 391 Dec 30, 2022
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
color detection using python

colordetection color detection using python In this color detection Python project, we are going to build an application through which you can automat

Ruchith Kumar 1 Nov 04, 2021
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 07, 2023
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
fishington.io bot with OpenCV and NumPy

fishington.io-bot fishington.io bot with using OpenCV and NumPy bot can continue to fishing fully automatically how to use Open cmd in fishington.io-b

Bahadır Araz 77 Jan 02, 2023
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
chineseocr/table_line 表格线检测模型pytorch版

table_line_pytorch chineseocr/table_detct 表格线检测模型table_line pytorch版 原项目github: https://github.com/chineseocr/table-detect 1、模型转换 下载原项目table_detect模型文

1 Oct 21, 2021
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
Generic framework for historical document processing

dhSegment dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different ty

Digital Humanities Laboratory 343 Dec 24, 2022