A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Last update: Dec 12, 2022

Related tags

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A PyTorch implement of TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (ECCV 2018) by Megvii

Paper link: arXiv:1807.01544
Github: princewang1994/TextSnake.pytorch
Blog: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Paper

Comparison of diﬀerent representations for text instances. (a) Axis-aligned rectangle. (b) Rotated rectangle. (c) Quadrangle. (d) TextSnake. Obviously, the proposed TextSnake representation is able to eﬀectively and precisely describe the geometric properties, such as location, scale, and bending of curved text with perspective distortion, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle with giving accurate predictions in such cases.

Textsnake elements:

center point
tangent line
text region

Description

Generally, this code has following features:

include complete training and inference code
pure python version without extra compiling
compatible with laste PyTorch version (write with pytroch 0.4.0)
support TotalText and SynthText dataset

Getting Started

This repo includes the training code and inference demo of TextSnake, training and infercence can be simplely run with a few code.

Prerequisites

To run this repo successfully, it is highly recommanded with:

Linux (Ubuntu 16.04)
Python3.6
Anaconda3
NVIDIA GPU(with 8G or larger GPU memory for training, 2G for inference)

(I haven't test it on other Python version.)

clone this repository

git clone https://github.com/princewang1994/TextSnake.pytorch.git

python package can be installed with pip

$ cd $TEXTSNAKE_ROOT
$ pip install -r requirements.txt

Data preparation

Total-Text: follow the dataset/total_text/README.md
SynthText: follow the datset/synth-text/README.md

Pretraining with SynthText

$ CUDA_VISIBLE_DEVICES=$GPUID python train.py synthtext_pretrain --dataset synth-text --viz --max_epoch 1 --batch_size 8

Training

Training model with given experiment name $EXPNAME

training from scratch:

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py $EXPNAME --viz

training with pretrained model(improved performance much)

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py example --viz --batch_size 8 --resume save/synthtext_pretrain/textsnake_vgg_0.pth

options:

exp_name: experiment name, used to identify different training processes
--viz: visualization toggle, output pictures are saved to ./vis by default

other options can be show by run python train.py -h

Running tests

Runing following command can generate demo on TotalText dataset (300 pictures), the result are save to ./vis by default

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python eval_textsnake.py $EXPNAME --checkepoch 190

options:

exp_name: experiment name, used to identify different training process

other options can be show by run python train.py -h

Evaluation

Total-Text metric is included in dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py, you should first modify the input_dir in Deteval.py and run following command for computing DetEval:

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.8 --tp 0.4

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.7 --tp 0.6

it will output metrics reports.

Pretrained Models

SynthText pretrained model: synthtext_fixlr/textsnake_vgg_0.pth (extract code: xmoh)
Total-Text pretrained model: finetune_larger_tcl/textsnake_vgg_180.pth (extract code: dms6)
Google Drive: TextSnake_pretrain

Download from links above and place pth file to the corresponding path(save/XXX/textsnake_vgg_XX.pth).

Performance

DetEval reporting

Following table reports DetEval metrics when we set vgg as the backbone(can be reproduced by using pertained model in Pretrained Model section):

	tr=0.7 / tp=0.6(P\|R\|F1)	tr=0.8 / tp=0.4(P\|R\|F1)	FPS(On single 1080Ti)
expand / no merge	0.652 \| 0.549 \| 0.596	0.874 \| 0.711 \| 0.784	12.07
expand / merge	0.698 \| 0.578 \| 0.633	0.859 \| 0.660 \| 0.746	8.38
no expand / no merge	0.753 \| 0.693 \| 0.722	0.695 \| 0.628 \| 0.660	9.94
no expand / merge	0.747 \| 0.677 \| 0.710	0.691 \| 0.602 \| 0.643	11.05
reported on paper	-	0.827 \| 0.745 \| 0.784

* expand denotes expanding radius by 0.3 times while post-processing

* merge denotes that merging overlapped instance while post-processing

Pure Inference

You can also run prediction on your own dataset without annotations:

Download pretrained model and place .pth file to save/pretrained/textsnake_vgg_180.pth
Run pure inference script as following:

$ EXPNAME=pretrained
$ CUDA_VISIBLE_DEVICES=$GPUID python demo.py $EXPNAME --checkepoch 180 --img_root /path/to/image

predicted result will be saved in output/$EXPNAME and visualization in vis/${EXPNAME}_deploy

Qualitative results

left: prediction/ground true
middle: text region(TR)
right: text center line(TCL)

What is comming

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgement

This project is writen by Prince Wang, part of codes refer to songdejia/EAST
Thanks techkang for your great help!

	tr=0.7 / tp=0.6(P\|R\|F1)	tr=0.8 / tp=0.4(P\|R\|F1)	FPS(On single 1080Ti)
expand / no merge	0.652 \| 0.549 \| 0.596	0.874 \| 0.711 \| 0.784	12.07
expand / merge	0.698 \| 0.578 \| 0.633	0.859 \| 0.660 \| 0.746	8.38
no expand / no merge	0.753 \| 0.693 \| 0.722	0.695 \| 0.628 \| 0.660	9.94
no expand / merge	0.747 \| 0.677 \| 0.710	0.691 \| 0.602 \| 0.643	11.05
reported on paper	-	0.827 \| 0.745 \| 0.784

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Related tags

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Paper

Description

Getting Started

Prerequisites

Data preparation

Pretraining with SynthText

Training

Running tests

Evaluation

Pretrained Models

Performance

DetEval reporting

Pure Inference

Qualitative results

What is comming

License

Acknowledgement

Owner

Prince Wang

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Simple app for visual editing of Page XML files

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

A curated list of promising OCR resources

A pkg stiching around view images(4-6cameras) to generate bird's eye view.

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

An interactive document scanner built in Python using OpenCV

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

MXNet OCR implementation. Including text recognition and detection.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Python package for handwriting and sketching in Jupyter cells

Generate text images for training deep learning ocr model

Vietnamese Language Detection and Recognition

A Python wrapper for Google Tesseract

This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

https://arxiv.org/abs/1904.01941