Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Overview

Scene Text Recognition Recommendations


Everythin about Scene Text Recognition

SOTA Papers Datasets Code

Contents

1.Papers

All Papers Can be Find Here

  • Latest Papers:
up to (2021-12-8)
up to (2021-12-3)
up to (2021-11-25)

2.Datasets

2.1 Synthetic Datasets

Dataset Description Examples BaiduNetdisk link
SynthText 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations SynthText Scene text datasets(提取码:emco)
MJSynth 6 million synthetic text instances. It's a generation of SynthText. MJText Scene text datasets(提取码:emco)

2.2 Benchmarks

Dataset Description Examples BaiduNetdisk link
IIIT5k-Words(IIIT5K) 3000 test images instances. Take from street scenes and from originally-digital images IIIT5K Scene text datasets(提取码:emco)
Street View Text(SVT) 647 test images instances. Some images are severely corrupted by noise, blur, and low resolution SVT Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P) 639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle. SVTP Scene text datasets(提取码:emco)
ICDAR 2003(IC03) 867 test image instances IC03 Scene text datasets(提取码:mfir)
ICDAR 2013(IC13) 1015 test images instances IC13 Scene text datasets(提取码:emco)
ICDAR 2015(IC15) 2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented IC15 Scene text datasets(提取码:emco)
CUTE80(CUTE) 288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution CUTE Scene text datasets(提取码:emco)

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

  • PaddlePaddle/PaddleOCR
  • 特性 (截取至PaddleOCR):
    • 使用百度自研深度学习框架PaddlePaddle搭建
    • PP-OCR系列高质量预训练模型,准确的识别效果
      • 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
      • 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
      • 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
      • 支持中英文数字组合识别、竖排文本识别、长文本识别
      • 支持多语言识别:韩语、日语、德语、法语
      • 丰富易用的OCR相关工具组件
    • 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
      • 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
      • 文档分析能力PP-Structure:版面分析与表格识别
      • 支持用户自定义训练,提供丰富的预测推理部署方案
      • 支持PIP快速安装使用
      • 可运行于Linux、Windows、MacOS等多种系统
  • 支持算法(识别):
    • CRNN
    • Rosetta
    • STAR-Net
    • RARE
    • SRN
    • NRTR

MMOCR (商汤)

  • open-mmlab/mmocr
  • 特性(截取至MMOCR):
    • MMOCR 是基于 PyTorchmmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务,如关键信息提取。 它是 OpenMMLab 项目的一部分。
    • 该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
  • 支持算法(识别)
    • CRNN (TPAMI'2016)
    • NRTR (ICDAR'2019)
    • RobustScanner (ECCV'2020)
    • SAR (AAAI'2019)
    • SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
    • SegOCR (Manuscript'2021)

Deep Text Recognition Benchmark (ClovaAI)


3.2. Algorithms

CRNN


ASTER

  • Tensorflow, official, 651 : bgshih/aster
    • 官方实现版本,使用Tensorflow
  • Pytorch, 535 :ayumuymk/aster.pytorch
    • Pytorch版本,准确率相较原文有明显提升

MORANv2

  • Pytorch, official, 572 :Canjie-Luo/MORAN_v2
    • MORAN v2版本。更加稳定的单阶段训练,更换ResNet做backbone,使用双向解码器

4.SOTA

Regular Dataset Irregular  dataset
Model Year IIIT SVT IC13(857) IC13(1015) IC15(1811) IC15(2077) SVTP CUTE
CRNN  2015 78.2 80.8 - 86.7 - - - -
ASTER(L2R)  2015 92.67 91.16 - 90.74 76.1 - 78.76 76.39
CombBest  2019 87.9 87.5 93.6 92.3 77.6 71.8 79.2 74
ESIR 2019 93.3 90.2 - 91.3 - 76.9 79.6 83.3
SE-ASTER  2020 93.8 89.6 - 92.8 80 81.4 83.6
DAN  2020 94.3 89.2 - 93.9 - 74.5 80 84.4
RobustScanner 2020 95.3 88.1 - 94.8 - 77.1 79.5 90.3
AutoSTR  2020 94.7 90.9 - 94.2 81.8 - 81.7 -
Yang et al.  2020 94.7 88.9 - 93.2 79.5 77.1 80.9 85.4
SATRN  2020 92.8 91.3 - 94.1 - 79 86.5 87.8
SRN  2020 94.8 91.5 95.5 - 82.7 - 85.1 87.8
GA-SPIN  2021 95.2 90.9 - 94.8 82.8 79.5 83.2 87.5
PREN2D  2021 95.6 94 96.4 - 83 - 87.6 91.7
Bhunia et al.  2021 95.2 92.2 - 95.5 - 84 85.7 89.7
VisionLAN  2021 95.8 91.7 95.7 - 83.7 - 86 88.5
ABINet  2021 96.2 93.5 97.4 - 86.0 - 89.3 89.2
MATRN 2021 96.7 94.9 97.9 95.8 86.6 82.9 90.5 94.1

Baek's Reimplementation Version

img

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 03, 2023
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 02, 2023
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 06, 2022
Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

Alex 80 Dec 30, 2022
The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

SpeechDrivesTemplates The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv

Qian Shenhan 53 Dec 23, 2022
Multi-choice answer sheet correction system using computer vision with opencv & python.

Multi choice answer correction 🔴 5 answer sheet samples with a specific solution for detecting answers and sheet correction. 🔴 By running the soluti

Reza Firouzi 7 Mar 07, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 04, 2023
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 05, 2023
Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks? Artifact Detection/Correction - Offcial PyTorch Implementation This rep

CHOI HWAN IL 23 Dec 20, 2022