MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Last update: Dec 27, 2022

Overview

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Python 2.7	Python 3.6

MORAN is a network with rectification mechanism for general scene text recognition. The paper (accepted to appear in Pattern Recognition, 2019) in arXiv, final version is available now.

Here is a brief introduction in Chinese.

Recent Update

2019.03.21 Fix a bug about Fractional Pickup.
Support Python 3.

Improvements of MORAN v2:

More stable rectification network for one-stage training
Replace VGG backbone by ResNet
Use bidirectional decoder (a trick borrowed from ASTER)

Version	IIIT5K	SVT	IC03	IC13	SVT-P	CUTE80	IC15 (1811)	IC15 (2077)
MORAN v1 (curriculum training)*	91.2	88.3	95.0	92.4	76.1	77.4	74.7	68.8
MORAN v2 (one-stage training)	93.4	88.3	94.2	93.2	79.7	81.9	77.8	73.9

*The results of v1 were reported in our paper. If this project is helpful for your research, please cite our Pattern Recognition paper.

Requirements

(Welcome to develop MORAN together.)

We recommend you to use Anaconda to manage your libraries.

Python 2.7 or Python 3.6 (Python 3 is faster than Python 2)
PyTorch 0.3.* (Higher version causes slow training, please ref to issue#8)
TorchVision
OpenCV
PIL (Pillow)
Colour
LMDB
matplotlib

Or use pip to install the libraries. (Maybe the torch is different from the anaconda version. Please check carefully and fix the warnings in training stage if necessary.)

    pip install -r requirements.txt

Data Preparation

Please convert your own dataset to LMDB format by using the tool (run in Python 2.7) provided by @Baoguang Shi.

You can also download the training (NIPS 2014, CVPR 2016) and testing datasets prepared by us.

The raw pictures of testing datasets can be found here.

Training and Testing

Modify the path to dataset folder in train_MORAN.sh:

	--train_nips path_to_dataset \
	--train_cvpr path_to_dataset \
	--valroot path_to_dataset \

And start training: (manually decrease the learning rate for your task)

	sh train_MORAN.sh

The training process should take less than 20s for 100 iterations on a 1080Ti.

Demo

Download the model parameter file demo.pth.

BaiduCloud (password: l8em)
Google Drive
OneDrive

Put it into root folder. Then, execute the demo.py for more visualizations.

	python demo.py

Citation

@article{cluo2019moran,
  author    = {Canjie Luo and Lianwen Jin and Zenghui Sun},
  title     = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
  journal   = {Pattern Recognition}, 
  volume    = {90}, 
  pages     = {109--118},
  year      = {2019},
  publisher = {Elsevier}
}

Acknowledgment

The repo is developed based on @Jieru Mei's crnn.pytorch and @marvis' ocr_attention. Thanks for your contribution.

Attention

The project is only free for academic research purposes.

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Related tags

Overview

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Recent Update

Improvements of MORAN v2:

Requirements

Data Preparation

Training and Testing

Demo

Citation

Acknowledgment

Attention

Owner

Canjie Luo

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

A curated list of papers and resources for scene text detection and recognition

Perspective recovery of text using transformed ellipses

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Read Japanese manga inside browser with selectable text.

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

CellProfiler is a open-source application for biological image analysis

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

A simple component to display annotated text in Streamlit apps.

基于图像识别的开源RPA工具，理论上可以支持所有windows软件和网页的自动化

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

A python program to block out your face

基于Paddle框架的PSENet复现

docstrum