The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Overview

Mask TextSpotter

A Pytorch implementation of Mask TextSpotter along with its extension can be find here

Introduction

This is the official implementation of Mask TextSpotter.

Mask TextSpotter is an End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

For more details, please refer to our paper.

Citing the paper

Please cite the paper in your publications if it helps your research:

@inproceedings{LyuLYWB18,
  author    = {Pengyuan Lyu and
               Minghui Liao and
               Cong Yao and
               Wenhao Wu and
               Xiang Bai},
  title     = {Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes},
  booktitle = {Proc. ECCV},
  pages     = {71--88},
  year      = {2018}
}

Contents

  1. Requirements
  2. Installation
  3. Models
  4. Datasets
  5. Test
  6. Train

Requirements

  • NVIDIA GPU, Linux, Python2
  • Caffe2, various standard Python packages

Installation

Caffe2

To install Caffe2 with CUDA support, follow the installation instructions from the Caffe2 website. If you already have Caffe2 installed, make sure to update your Caffe2 to a version that includes the Detectron module.

Please ensure that your Caffe2 installation was successful before proceeding by running the following commands and checking their output as directed in the comments.

# To check if Caffe2 build was successful
python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'

If the caffe2 Python package is not found, you likely need to adjust your PYTHONPATH environment variable to include its location (/path/to/caffe2/build, where build is the Caffe2 CMake build directory).

Install Python dependencies:

pip install numpy pyyaml matplotlib opencv-python>=3.0 setuptools Cython mock

Set up Python modules:

cd $ROOT_DIR/lib && make

Note: Caffe2 is difficult to install sometimes.

Models

Download the model and place it as models/model_iter79999.pkl Our trained model: Google Drive; BaiduYun (key of BaiduYun: gnpc)

Datasets

Download the ICDAR2013(Google Drive, BaiduYun) and ICDAR2015(Google Drive, BaiduYun) as examples. Datasets should be placed in lib/datasets/data/ as below

synth
icdar2013
icdar2015
scut-eng-char
totaltext

If you do not train the model, you can just download the ICDAR2013 or ICDAR2015 datasets for testing.

Test

python tools/test_net.py --cfg configs/text/mask_textspotter.yaml

You can modify the model path or the test dataset in configs/text/mask_textspotter.yaml.

Train

You should format all the datasets you used for training as above. Then modify configs/text/mask_textspotter.yaml to fit the gpus, model path, and datasets.

python tools/train_net.py --cfg configs/text/mask_textspotter.yaml
Owner
Pengyuan Lyu
Pengyuan Lyu
Repository for playing the computer vision apps: People analytics on Raspberry Pi.

play-with-torch Repository for playing the computer vision apps: People analytics on Raspberry Pi. Tools Tested Hardware RasberryPi 4 Model B here, RA

eMHa 1 Sep 23, 2021
Image processing is one of the most common term in computer vision

Image processing is one of the most common term in computer vision. Computer vision is the process by which computers can understand images and videos, and how they are stored, manipulated, and retri

Happy N. Monday 3 Feb 15, 2022
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Siva Prakash 11 Jan 02, 2022
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 01, 2023
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

PassportScanner Works with 2 and 3 line identity documents. What is this With PassportScanner you can use your camera to scan the MRZ code of a passpo

Edwin Vermeer 441 Dec 24, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

17 Dec 08, 2022
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Program created with opencv that allows you to automatically count your repetitions on several fitness exercises.

Virtual partner of gym Description Program created with opencv that allows you to automatically count your repetitions on several fitness exercises li

1 Jan 04, 2022
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want.

Virtual Keyboard With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want. At

Güldeniz Bektaş 5 Jan 23, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.4k Jan 09, 2023
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022