Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Last update: Nov 24, 2022

Overview

Box_Discretization_Network

This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method [link], which won the championship.

PPT link [Google Drive][Baidu Cloud]

Generate your own JSON: [Google Drive][Baidu Cloud]

Brief introduction (in Chinese): [Google Drive][Baidu Cloud]

Competition related

Competition model and config files (it needs a lot of video memory):

Paper [Link] (Exploring the Capacity of Sequential-free Box Discretization Networkfor Omnidirectional Scene Text Detection)
Config file [BaiduYun Link]. Models below all use this config file except directory. Results below are the multi-scale ensemble results. The very details are described in our updated paper.
MLT 2017 Model [BaiduYun Link].

MLT 2017	Recall	Precision	Hmean
new	76.44	82.75	79.47

ReCTS 2019 model [BaiduYun Link]

ReCTS Detection	Recall	Precision	Hmean
new	93.97	92.76	93.36

HRSC_2016 model [BaiduYun Link].

HRSC_2016	Recall	Precision	Hmean	TIoU-Hmean	AP
IJCAI version	94.8	46.0	61.96	51.1	93.7
new	94.1	83.8	88.65	73.3	89.22

Online demo is updating (the old demo version used a wrong configuration). This demo uses the MLT model provided above. It can detect multi-lingual text but can only recognize English, Chinese, and most of the symbols.

Description

Please see our paper at [link].

The advantages:

BDN can directly produce compact quadrilateral detection box. (segmentation-based methods need additional steps to group pixels & such steps usually sensitive to outliers)
BDN can avoid label confusion (non-segmentation-based methods are mostly sensitive to label sequence, which can significantly undermine the detection result). Comparison on ICDAR 2015 dataset showing different methods’ ability of resistant to the label confusion issue (by adding rotated pseudo samples). Textboxes++, East, and CTD are all Sesitive-to-Label-Sequence methods.

	Textboxes++ [code]	East [code]	CTD [code]	Ours
Variances (Hmean)	↓ 9.7%	↓ 13.7%	↓ 24.6%	↑ 0.3%

Getting Started

A basic example for training and testing. This mini example offers a pure baseline that takes less than 4 hours (with 4 1080 ti) to finalize training with only official training data.

Install anaconda

Link：https://pan.baidu.com/s/1TGy6O3LBHGQFzC20yJo8tg psw：vggx

Step-by-step install

conda create --name mb
conda activate mb
conda install ipython
pip install ninja yacs cython matplotlib tqdm scipy shapely
conda install pytorch=1.0 torchvision=0.2 cudatoolkit=9.0 -c pytorch
conda install -c menpo opencv
export INSTALL_DIR=$PWD
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
cd $INSTALL_DIR
git clone https://github.com/Yuliang-Liu/Box_Discretization_Network.git
cd Box_Discretization_Network
python setup.py build develop

MUST USE torchvision=0.2

Pretrained model：

[Link] unzip under project_root

（This is ONLY an ImageNet Model With a few iterations on ic15 training data for a stable initialization）

ic15 data

Prepare data follow COCO format. [Link] unzip under datasets/

Train

After downloading data and pretrained model, run

bash quick_train_guide.sh

Test with [TIoU]

Run

bash my_test.sh

Put kes.json to ic15_TIoU_metric/ inside ic15_TIoU_metric/

Run (conda deactivate; pip install Polygon2)

python2 to_eval.py

Example results:

mask branch 79.4 (test segm.json by changing to_eval.py (line 10: mode=0) );
kes branch 80.4;
in .yaml, set RESCORING=True -> 80.8;
Set RESCORING=True and RESCORING_GAMA=0.8 -> 81.0;
One can try many other tricks such as CROP_PROB_TRAIN, ROTATE_PROB_TRAIN, USE_DEFORMABLE, DEFORMABLE_PSROIPOOLING, PNMS, MSR, PAN in the project, whcih were all tested effective to improve the results. To achieve state-of-the-art performance, extra data (syntext, MLT, etc.) and proper training strategies are necessary.

Visualization

Run

bash single_image_demo.sh

Citation

If you find our method useful for your reserach, please cite

@article{liu2019omnidirectional,
  title={Omnidirectional Scene Text Detection with Sequential-free Box Discretization},
  author={Liu, Yuliang and Zhang, Sheng and Jin, Lianwen and Xie, Lele and Wu, Yaqiang and Wang, Zhepeng},
  journal={IJCAI},
  year={2019}
}
@article{liu2019exploring,
  title={Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection},
  author={Liu, Yuliang and He, Tong and Chen, Hao and Wang, Xinyu and Luo, Canjie and Zhang, Shuaitao and Shen, Chunhua and Jin, Lianwen},
  journal={arXiv preprint arXiv:1912.09629},
  year={2019}
}

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to [email protected] or [email protected]. For commercial usage, please contact Prof. Lianwen Jin via [email protected].

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Related tags

Overview

Box_Discretization_Network

Competition related

Description

Getting Started

Install anaconda

Step-by-step install

Pretrained model：

ic15 data

Train

Test with [TIoU]

Visualization

Citation

Feedback

Owner

Yuliang Liu

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Instance Semantic Segmentation List

3D ResNets for Action Recognition (CVPR 2018)

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Object Tracking and Detection Using OpenCV

Official source code of Fast Point Transformer, CVPR 2022

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

Neural style transfer as a class in PyTorch

Solution to the first stage Quiz of Hamoye internship: Introduction to Python for Machine Learning

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Representing Long-Range Context for Graph Neural Networks with Global Attention