Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Overview

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Alt text

Introduction

This is a PyTorch implementation of "SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training"

The paper propose a novel text detection system termed SelfText Beyond Polygon(SBP) with Bounding Box Supervision(BBS) and Dynamic Self Training~(DST), where training a polygon-based text detector with only a limited set of upright bounding box annotations. As shown in the Figure, SBP achieves the same performance as strong supervision while saving huge data annotation costs.

From more details,please refer to our arXiv paper

Environments

  • python 3
  • torch = 1.1.0
  • torchvision
  • Pillow
  • numpy

ToDo List

  • Release code(BBS)
  • Release code(DST)
  • Document for Installation
  • Document for testing and training
  • Evaluation
  • Demo script
  • re-organize and clean the parameters

Dataset

Supported:

  • ICDAR15
  • ICDAR17MLI
  • sythtext800K
  • TotalText
  • MSRA-TD500
  • CTW1500

model zoo

Supported text detection:

Bounding Box Supervision(BBS)

Train

The training strategy includes three steps: (1) training SASN with synthetic data (2) generating pseudo label on real data based on bounding box annotation with SASN (3) training the detectors(EAST and PSENet) with the pseudo label

training SASN with synthtext or curved synthtext

(TDB)

generating pseudo label on real data with SASN

(TDB)

training EAST or PSENet with the pseudo label

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Dynamic Self Training

Train

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

Method Dataset Pretrain precision recall f-score
EAST_box ICDAR15 - 65.8 63.8 64.8
EAST ICDAR15 - 76.9 77.1 77.0
EAST_pseudo(SynthText) ICDAR15 - 77.8 78.2 78.0
EAST_box ICDAR15 SynthText 70.8 72.0 71.4
EAST ICDAR15 SynthText 82.0 82.4 82.2
EAST_pseudo(SynthText) ICDAR15 SynthText 81.3 82.2 81.8

The performance of EAST on MSRA-TD500

Method Dataset Pretrain precision recall f-score
EAST_box MSRA-TD500 - 40.49 31.05 35.15
EAST MSRA-TD500 - 71.76 69.05 70.38
EAST_pseudo(SynthText) MSRA-TD500 - 71.27 67.54 69.36
EAST_box MSRA-TD500 SynthText 48.34 42.37 45.16
EAST MSRA-TD500 SynthText 77.91 76.45 77.17
EAST_pseudo(SynthText) MSRA-TD500 SynthText 77.42 73.85 75.59

The performance of PSENet on ICDAR15

Method Dataset Pretrain precision recall f-score
PSENet_box ICDAR15 - 70.17 69.09 69.63
PSENet ICDAR15 - 81.6 79.5 80.5
PSENet_pseudo(SynthText) ICDAR15 - 82.9 77.6 80.2
PSENet_box ICDAR15 SynthText 72.65 74.29 73.46
PSENet ICDAR15 SynthText 86.42 83.54 84.96
PSENet_pseudo(SynthText) ICDAR15 SynthText 86.77 83.34 85.02

The performance of PSENet on MSRA-TD500

Method Dataset Pretrain precision recall f-score
PSENet_box MSRA-TD500 - 47.17 36.90 41.41
PSENet MSRA-TD500 - 80.86 77.72 79.13
PSENet_pseudo(SynthText) MSRA-TD500 - 80.32 77.26 78.86
PSENet_box MSRA-TD500 SynthText 47.45 39.49 43.11
PSENet MSRA-TD500 SynthText 84.11 84.97 84.54
PSENet_pseudo(SynthText) MSRA-TD500 SynthText 84.03 84.03 84.03

The performance of PSENet on Total Text

Method Dataset Pretrain precision recall f-score
PSENet_box Total Text - 46.5 43.6 45.0
PSENet Total Text - 80.4 76.5 78.4
PSENet_pseudo(SynthText) Total Text - 80.33 73.54 76.78
PSENet_pseudo(Curved SynthText) Total Text - 81.68 74.61 78.0
PSENet_box Total Text SynthText 51.94 47.45 49.59
PSENet Total Text SynthText 83.4 78.1 80.7
PSENet_pseudo(SynthText) Total Text SynthText 81.57 75.54 78.44
PSENet_pseudo(Curved SynthText) Total Text SynthText 82.51 77.57 80.0

The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text

links

https://github.com/SakuraRiven/EAST

https://github.com/WenmuZhou/PSENet.pytorch

License

For academic use, this project is licensed under the Apache License - see the LICENSE file for details. For commercial use, please contact the authors.

Citations

Please consider citing our paper in your publications if the project helps your research.

Eamil: [email protected]

Owner
weijiawu
computer version, OCR I am looking for a research intern or visiting chance.
weijiawu
Papers about explainability of GNNs

Papers about explainability of GNNs

Dongsheng Luo 236 Jan 04, 2023
Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Fast-Partial-Ranking-MNL This repo provides a PyTorch implementation for the CopulaGNN models as described in the following paper: Fast Learning of MN

Xingjian Zhang 3 Aug 19, 2022
Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

66 Dec 15, 2022
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Xiong Weiyu 1 Jul 14, 2022
Stacked Recurrent Hourglass Network for Stereo Matching

SRH-Net: Stacked Recurrent Hourglass Introduction This repository is supplementary material of our RA-L submission, which helps reviewers to understan

28 Jan 03, 2023
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022
The Environment I built to study Reinforcement Learning + Pokemon Showdown

pokemon-showdown-rl-environment The Environment I built to study Reinforcement Learning + Pokemon Showdown Been a while since I ran this. Think it is

3 Jan 16, 2022
Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

Official Discussion Group (Telegram): https://t.me/video2x A Discord server is also available. Please note that most developers are only on Telegram.

K4YT3X 5.9k Dec 31, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022
Implementation of a Transformer using ReLA (Rectified Linear Attention)

ReLA (Rectified Linear Attention) Transformer Implementation of a Transformer using ReLA (Rectified Linear Attention). It will also contain an attempt

Phil Wang 49 Oct 14, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
Implementation for Simple Spectral Graph Convolution in ICLR 2021

Simple Spectral Graph Convolutional Overview This repo contains an example implementation of the Simple Spectral Graph Convolutional (S^2GC) model. Th

allenhaozhu 64 Dec 31, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks

English | 简体中文 Introduction DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks Reference Pat

CV Newbie 28 Dec 13, 2022
Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022
Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

Wu Huikai 402 Dec 27, 2022
Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

jBrain Software associated with the AAAI 2022 paper Francesco D'Amore, Daniel Mitropolsky, Pierluigi Crescenzi, Emanuele Natale, Christos H. Papadimit

Pierluigi Crescenzi 1 Apr 10, 2022
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images In this paper, we present an effective Dynamic Enhancement Anchor

13 Dec 09, 2022