Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Overview

Scene Text Recognition Recommendations


Everythin about Scene Text Recognition

SOTA Papers Datasets Code

Contents

1.Papers

All Papers Can be Find Here

  • Latest Papers:
up to (2021-12-8)
up to (2021-12-3)
up to (2021-11-25)

2.Datasets

2.1 Synthetic Datasets

Dataset Description Examples BaiduNetdisk link
SynthText 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations SynthText Scene text datasets(提取码:emco)
MJSynth 6 million synthetic text instances. It's a generation of SynthText. MJText Scene text datasets(提取码:emco)

2.2 Benchmarks

Dataset Description Examples BaiduNetdisk link
IIIT5k-Words(IIIT5K) 3000 test images instances. Take from street scenes and from originally-digital images IIIT5K Scene text datasets(提取码:emco)
Street View Text(SVT) 647 test images instances. Some images are severely corrupted by noise, blur, and low resolution SVT Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P) 639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle. SVTP Scene text datasets(提取码:emco)
ICDAR 2003(IC03) 867 test image instances IC03 Scene text datasets(提取码:mfir)
ICDAR 2013(IC13) 1015 test images instances IC13 Scene text datasets(提取码:emco)
ICDAR 2015(IC15) 2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented IC15 Scene text datasets(提取码:emco)
CUTE80(CUTE) 288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution CUTE Scene text datasets(提取码:emco)

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

  • PaddlePaddle/PaddleOCR
  • 特性 (截取至PaddleOCR):
    • 使用百度自研深度学习框架PaddlePaddle搭建
    • PP-OCR系列高质量预训练模型,准确的识别效果
      • 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
      • 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
      • 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
      • 支持中英文数字组合识别、竖排文本识别、长文本识别
      • 支持多语言识别:韩语、日语、德语、法语
      • 丰富易用的OCR相关工具组件
    • 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
      • 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
      • 文档分析能力PP-Structure:版面分析与表格识别
      • 支持用户自定义训练,提供丰富的预测推理部署方案
      • 支持PIP快速安装使用
      • 可运行于Linux、Windows、MacOS等多种系统
  • 支持算法(识别):
    • CRNN
    • Rosetta
    • STAR-Net
    • RARE
    • SRN
    • NRTR

MMOCR (商汤)

  • open-mmlab/mmocr
  • 特性(截取至MMOCR):
    • MMOCR 是基于 PyTorchmmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务,如关键信息提取。 它是 OpenMMLab 项目的一部分。
    • 该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
  • 支持算法(识别)
    • CRNN (TPAMI'2016)
    • NRTR (ICDAR'2019)
    • RobustScanner (ECCV'2020)
    • SAR (AAAI'2019)
    • SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
    • SegOCR (Manuscript'2021)

Deep Text Recognition Benchmark (ClovaAI)


3.2. Algorithms

CRNN


ASTER

  • Tensorflow, official, 651 : bgshih/aster
    • 官方实现版本,使用Tensorflow
  • Pytorch, 535 :ayumuymk/aster.pytorch
    • Pytorch版本,准确率相较原文有明显提升

MORANv2

  • Pytorch, official, 572 :Canjie-Luo/MORAN_v2
    • MORAN v2版本。更加稳定的单阶段训练,更换ResNet做backbone,使用双向解码器

4.SOTA

Regular Dataset Irregular  dataset
Model Year IIIT SVT IC13(857) IC13(1015) IC15(1811) IC15(2077) SVTP CUTE
CRNN  2015 78.2 80.8 - 86.7 - - - -
ASTER(L2R)  2015 92.67 91.16 - 90.74 76.1 - 78.76 76.39
CombBest  2019 87.9 87.5 93.6 92.3 77.6 71.8 79.2 74
ESIR 2019 93.3 90.2 - 91.3 - 76.9 79.6 83.3
SE-ASTER  2020 93.8 89.6 - 92.8 80 81.4 83.6
DAN  2020 94.3 89.2 - 93.9 - 74.5 80 84.4
RobustScanner 2020 95.3 88.1 - 94.8 - 77.1 79.5 90.3
AutoSTR  2020 94.7 90.9 - 94.2 81.8 - 81.7 -
Yang et al.  2020 94.7 88.9 - 93.2 79.5 77.1 80.9 85.4
SATRN  2020 92.8 91.3 - 94.1 - 79 86.5 87.8
SRN  2020 94.8 91.5 95.5 - 82.7 - 85.1 87.8
GA-SPIN  2021 95.2 90.9 - 94.8 82.8 79.5 83.2 87.5
PREN2D  2021 95.6 94 96.4 - 83 - 87.6 91.7
Bhunia et al.  2021 95.2 92.2 - 95.5 - 84 85.7 89.7
VisionLAN  2021 95.8 91.7 95.7 - 83.7 - 86 88.5
ABINet  2021 96.2 93.5 97.4 - 86.0 - 89.3 89.2
MATRN 2021 96.7 94.9 97.9 95.8 86.6 82.9 90.5 94.1

Baek's Reimplementation Version

img

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 08, 2023
Fully-automated scripts for collecting AI-related papers

AI-Paper-Collector Web demo: https://ai-paper-collector.vercel.app/ (recommended) Colab notebook: here Motivation Fully-automated scripts for collecti

772 Dec 30, 2022
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 05, 2023
Isearch (OSINT) 🔎 Face recognition reverse image search on Instagram profile feed photos.

isearch is an OSINT tool on Instagram. Offers a face recognition reverse image search on Instagram profile feed photos.

Malek salem 20 Oct 25, 2022
Textboxes implementation with Tensorflow (python)

tb_tensorflow A python implementation of TextBoxes Dependencies TensorFlow r1.0 OpenCV2 Code from Chaoyue Wang 03/09/2017 Update: 1.Debugging optimize

Jayne Shin (신재인) 20 May 31, 2019
A simple Digits Recogniser made in Python

⭐ Python Digit Recogniser A simple digit Recogniser made in Python Demo Run Locally Clone the project git clone https://github.com/yashraj-n/python-

Yashraj narke 4 Nov 29, 2021
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
Hiiii this is the Spanish for Linux and win 10 and in the near future the english version of PortScan my new tool on which you can see what ports are Open only with the IP adress.

PortScanner-by-IIT PortScanner es una herramienta programada en Python3. Como su nombre indica esta herramienta escanea los primeros 150 puertos de re

5 Sep 19, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
OCR, Object Detection, Number Plate, Real Time

README.md PrePareded anaconda env requirements.txt clova AI → deep text recognition → trained weights (ex, .pth) wpod-net weights (ex, .h5 , .json) ht

Kaven Lee 7 Dec 06, 2022
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition Released the code of RepMLP together with an example o

260 Jan 03, 2023
A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

Tom 794 Dec 30, 2022
[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

ADAPET This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training". The model improves and simpl

Rakesh R Menon 138 Dec 26, 2022
question‘s area recognition using image processing and regular expression

======================================== Paper-Question-recognition ======================================== question‘s area recognition using image p

Yuta Mizuki 7 Dec 27, 2021
Text layer for bio-image annotation.

napari-text-layer Napari text layer for bio-image annotation. Installation You can install using pip: pip install napari-text-layer Keybindings and m

6 Sep 29, 2022